© Distribution of this video is restricted by its owner
00:02 | Okay, so I will talk about fourier transforms today. And the reason |
|
|
00:13 | that One of the teams when I to do a project on the 50's |
|
|
00:20 | huh. As the acronym is for fourier transform. And uh my understanding |
|
|
00:29 | that the team is adventurers and don't a whole lot of background in terms |
|
|
00:37 | their 50s. And it's um one the most widely used computational procedures or |
|
|
00:47 | of anything. So I'll decide to about it and I'll try to done |
|
|
00:54 | once I described what it is, of the techniques were used to try |
|
|
00:59 | get good performance on processors. please. I love all right. |
|
|
01:11 | there's kind of an outline of So first introduced the discrete fourier |
|
|
01:18 | that is the basis for the fast transform and then look at how F |
|
|
01:28 | actually is working or put together. and mhm That's pretty much it. |
|
|
01:41 | motivation. So why one should be and quite important in many disciplines are |
|
|
01:50 | their 50s. This is an example original wide range of different types of |
|
|
01:58 | where it's used. So it on left hand side is kind of electron |
|
|
02:05 | microscopy that is basically dealing with They are highly very noisy images as |
|
|
02:18 | thing with the young go circle shows it looks like just noise but it |
|
|
02:24 | . And to try to make something out of these. I didn't always |
|
|
02:27 | images. A 15 placing key role eventually one can actually reconstruct images of |
|
|
02:37 | like the one at the bottom on left hand side. Another part of |
|
|
02:43 | you may be involved with colleagues in department they are interested in it, |
|
|
02:51 | resonance imaging or admire are I for uh in that case uh 50 is |
|
|
02:59 | the critical role because that's the way actually managed to get an image of |
|
|
03:05 | going on on or the structure of brain. Others are kind of more |
|
|
03:11 | the single processing domain and just show examples here in astronomy that tends to |
|
|
03:17 | very large 50s and billions of data and then on the right hand side |
|
|
03:26 | is what's typical for The use of . 50s and American competitions for solving |
|
|
03:37 | forms of for instance fluids problems are finding out what goes on and some |
|
|
03:43 | to make. And the slide is to illustrate that F. F. |
|
|
03:53 | . S. And also were used a very long time to do |
|
|
03:59 | So it turns out that the four basis with just two econometric functions like |
|
|
04:07 | and co signs are very good set basis functions as it's known in many |
|
|
04:17 | image processing and all some of This isn't solving Pds and many types |
|
|
04:25 | functions can be represented as a serious sign or co some functions and this |
|
|
04:34 | of shows the case of the compression upper left them E. Is an |
|
|
04:44 | of a bunch of pixels and then rest of them from B. |
|
|
04:49 | D. and E. And Is um representing sort of the fourier |
|
|
04:56 | on that. A set of pixels the pictures of the IV. And |
|
|
05:04 | upon how many components of the four serious one users from yet the not |
|
|
05:11 | great representation of E or one can get pretty good representation of the for |
|
|
05:17 | lot less than just communicating or using the pixels. So instead of during |
|
|
05:25 | pixels and it's not clear from the slide weather 10 24 is the size |
|
|
05:35 | civilian pixels that probably what it is represent the so in this case instead |
|
|
05:41 | a million pixels and he is pretty than Earth is indeed a better representation |
|
|
05:47 | E. And for F. Arizona 400, coefficients that pretty much gets |
|
|
05:55 | good the resolution as the pixel So that um and what the direct |
|
|
06:03 | that was used in the mpeg and play standards or what's known as sign |
|
|
06:09 | coastline transforms. Um Now therefore for transform, but it's based on before |
|
|
06:16 | transform, so that was a motivational or of four FTS. Um So |
|
|
06:29 | we talk about the discrete fourier transform the discrete fourier transform is simply a |
|
|
06:35 | version of The four Year Transform or trying to do before we transform that |
|
|
06:43 | kind of defined on the top, is based on an integral of a |
|
|
06:48 | that have F of X. And are these four year basis. The |
|
|
06:56 | and cosine is often represented by Roots unity as they're called or E to |
|
|
07:01 | -1 something. So so that leads to the discrete fourier transform and I'm |
|
|
07:17 | going through here, how you go the continuous domain to the discrete |
|
|
07:22 | So we just have to take it face value that if you do it |
|
|
07:26 | in America one ends up with is ID for your transform that is in |
|
|
07:32 | middle of the image and sorry for rotation. So um for the bottom |
|
|
07:39 | of the image F of X is with um lower case X. So |
|
|
07:50 | the E to the minus. And yeah. Oh my God max |
|
|
07:55 | replaced by this or Madam piece. that you can see that has some |
|
|
08:05 | , what's on top for the continuous analytical fourier transform. So um the |
|
|
08:18 | the collection that discreet from your transform for each capital X. Indexed or |
|
|
08:26 | em the right hand side is simply a product as you do that product |
|
|
08:35 | this over the piece. But the values to sample some one has the |
|
|
08:42 | then um it is simply in the and then and repeat that for a |
|
|
08:53 | of and values that covers A certain from 0 to the mine is wonderful |
|
|
08:59 | enumerated again discrete values. So it's social in the next side it ends |
|
|
09:09 | being simply made to expect an the inverse discrete for a transform is |
|
|
09:17 | very analogous as you can see it's uh maybe it's factored multiplication something so |
|
|
09:26 | that, let's look at that what matrix actually looks like. So the |
|
|
09:33 | is then the omega piece to the of M times J. So um |
|
|
09:43 | if M is zero, that is first drill in this matrix, The |
|
|
09:50 | of Omega P is zero regardless of . So that means um If the |
|
|
09:57 | zero then regardless of what the home that are than the exponential Haitian results |
|
|
10:06 | the wants, the first row is . It's also the case that the |
|
|
10:12 | column is a one because then Jay zero and that's a summation index of |
|
|
10:19 | semester is X. Of zero is multiply by one. So on her |
|
|
10:27 | um particular structure where now the other entries is for the second role Mm |
|
|
10:38 | one. So it as you go and through the inner product than the |
|
|
10:45 | exponents of the moment. A piece simply the j value. That then |
|
|
10:51 | from 1, 2 p -1 of different start in the first column from |
|
|
10:55 | to P -1 And then one has . Yes, plug in the |
|
|
11:02 | And the jenny, someone got the of this W matrix representing the uh |
|
|
11:13 | of unity or the omega keys in a discreet for your transform and the |
|
|
11:25 | of interesting property of the fourier transform the discrete fourier transforming them to be |
|
|
11:33 | . In this case said the matrix is simply the inverse of uh the |
|
|
11:41 | entries in the forward transform. And fairly easy to show that in fact |
|
|
11:46 | is true. Yeah, so that's it's quite easy to compute in verse |
|
|
11:56 | you transform if it actually has what's then called the forward transform. So |
|
|
12:07 | questions on the G. F. . And may be familiar to or |
|
|
12:13 | but it's not, you're welcome to questions If not, I will turn |
|
|
12:26 | one. So to the first floor transform that this thing that um used |
|
|
12:36 | extensively in many disciplines. So I even compete in my matrix multiply or |
|
|
12:45 | even it's more frequently Houston matrix So the first thing is just to |
|
|
12:53 | out that there is the cause of um popularity of this algorithm. The |
|
|
13:01 | comes from it competition all efficiency. unlike the Matrix electoral products, that |
|
|
13:10 | basically order P squared estimate tricks exercise times P before we transform is that's |
|
|
13:21 | proportion of the P times log P of P square, if the |
|
|
13:27 | probably nobody cares. But if p a billion or more points the difference |
|
|
13:34 | large as I will show on the slide. So the transport was actually |
|
|
13:44 | discovered by the all physicists and I uh later we discovered, I don't |
|
|
13:54 | the gap in years between the original the cool it. Okay. Um |
|
|
14:03 | on the festival you transform that somehow got the name recognition for this way |
|
|
14:10 | computing. The discrete fourier transform uh with the notion always can different radi |
|
|
14:21 | are I guess it's called so it's to four and eight and I |
|
|
14:27 | show you that and that's actually important understand because that's a property of deriving |
|
|
14:34 | algorithm that is the basis for getting efficient implementations of the Cooley to can |
|
|
14:49 | and the cool it to transform as will see, I want to talk |
|
|
14:53 | it is restricted to particular sizes of dataset or the input factor accident doesn't |
|
|
15:04 | for an arbitrary length factor. It fact only applies the power of two |
|
|
15:10 | factors so people and uh I think with it and come up with other |
|
|
15:17 | and which one is known as a race addict that can be adopted them |
|
|
15:23 | be more flexible in terms of input sizes. Another one is rader's algorithm |
|
|
15:30 | then there are particularly clever one for when the input factor is of a |
|
|
15:40 | length or can be factors in the of crimes and then continues this prime |
|
|
15:48 | over them. Um mhm So I going to talk about the quality of |
|
|
15:57 | . That's that's the simplest one. and probably no also for the project |
|
|
16:06 | someone you will do and there are trick Aries that is being played. |
|
|
16:13 | Remember from the definition of the discrete transform it had this video factors about |
|
|
16:23 | play the you can see here on top right hand corner. Let's eat |
|
|
16:29 | in this case minus to pay high where so that means the homemade apiece |
|
|
16:35 | complex numbers. So yeah. it turns out that the input |
|
|
16:47 | mhm real and not complex numbers then can actually take advantage of that To |
|
|
16:55 | about a factor of two in operations memory. And I may be able |
|
|
17:00 | make some comments on that before the is over and then there is further |
|
|
17:07 | uh efficiencies that can be again, the data has certain symmetry properties and |
|
|
17:19 | it reduces more or less by another of two in terms of work. |
|
|
17:25 | And on Houston so called signing coastline and that's as I mentioned, was |
|
|
17:32 | basis for compression algorithms that jpeg and standards for many years now I want |
|
|
17:38 | switch to using way that transforms into and the other two things. Self |
|
|
17:48 | and in place. Self sorting a bit obscure at this point but it |
|
|
17:53 | become clear as I talked about it place simply me in some fun um |
|
|
18:02 | allocate separate members for input and So which would kind of double the |
|
|
18:10 | requirements and if you have a billion transforms doubling it. It's kind of |
|
|
18:16 | necessarily what you want to do. that's what's called in place algorithms. |
|
|
18:23 | they worked in an array that starts the input data and when you're said |
|
|
18:29 | done it has the transformer, the data without using much of temp |
|
|
18:40 | So here is just the pointing out the means to do something that is |
|
|
18:46 | important benefit or the p squared versus P log p. So as I |
|
|
18:54 | , uh astronomers are the ones I I'm not use not just the one |
|
|
19:02 | to 2 30 is a billion points they use many billions of points |
|
|
19:09 | I happen to have work with some dana way way back In the 90s |
|
|
19:18 | at that point already did multiple being transforms and the difference between square and |
|
|
19:29 | is quite significant as you can so and the number five instead of |
|
|
19:38 | L O P will be fine So that's in fact what it is |
|
|
19:44 | call it. Take the transform. now I'm going to start to look |
|
|
19:55 | how this FFT is can be the and get some insight into what it |
|
|
20:02 | and what the cleverness is but I'll for a second before diving into detail |
|
|
20:10 | there are questions it's not. So So there are two versions of the |
|
|
20:28 | it okay, there are many Russians in terms of how they kind of |
|
|
20:35 | in how the memory references kind of . Yeah. Typically called the decimation |
|
|
20:44 | frequency and decimation in time. And will show both of them the most |
|
|
20:53 | the literature from anyone reads papers or know it. They tend to favor |
|
|
21:00 | decimation in time and they don't mentioned in terms of decimation in frequency. |
|
|
21:07 | it turned out it turns out that terms of parallel computing in particular if |
|
|
21:14 | do large FTS on clusters, it's to understand both. And hopefully pointing |
|
|
21:27 | out before the lecture Is over why should know both. All right. |
|
|
21:36 | yes, now I'm trying to show have to There, I know what |
|
|
21:41 | basis is for their 50 and how get from something that is or the |
|
|
21:47 | square to something that is full of . So it all depends on the |
|
|
21:53 | that this matrix the w matrix the of the matrix or interests uh unique |
|
|
22:05 | and that's what's being exploited. And first formula transformed. So we'll talk |
|
|
22:14 | how to exploit it. I'll start the thing that's true results in the |
|
|
22:23 | and supposed to that the destination in . So for the decimation in frequency |
|
|
22:30 | thing is to observe as this particular of once known as the twitter factors |
|
|
22:38 | the roots of unity the omega p we're probably most of the trunk. |
|
|
22:45 | the notion of twitter factors And that uh huh. Talking about the end |
|
|
22:51 | the piece and the notion why they're roots of unity is illustrated on the |
|
|
22:58 | hand side. So which is a of the unit circle in the |
|
|
23:03 | The main and the omega piece basically this unit circle and the Lord of |
|
|
23:12 | P. Is the finer or the the angle is for each one of |
|
|
23:18 | slices. Remember that on a happy it to the -2? I divided |
|
|
23:26 | peak. So Pete tells you how uh slices there are as you go |
|
|
23:32 | the human circle. No, since omegas are A to the -2 pi |
|
|
23:44 | over P. Um When you take power omega p To the people of |
|
|
23:55 | two power. Since I should have that on the slide. Sorry. |
|
|
24:00 | So e to the -2 Parts are about 30 times. T over two |
|
|
24:07 | simply E. To the minus um hi B. Two. So that |
|
|
24:17 | you go half away around the unit , So that part of our monarchy |
|
|
24:25 | simply -1. So because you think as you increase the power or made |
|
|
24:34 | p, you kind of add one at the time and P over |
|
|
24:39 | you're halfway around the unit circle. um as you keep increasing the power |
|
|
24:45 | you get that's the one. Now the decimation of frequency one can look |
|
|
24:52 | the sum that they had, that basically take a roll of the dft |
|
|
25:01 | , that is the inner product. the inner product then is the omega |
|
|
25:08 | too. The power that is a of withdrawal you're working on. And |
|
|
25:15 | for the, going through the columns the matrix and multiply it by the |
|
|
25:20 | factor. So if you do you can split up the sum In |
|
|
25:25 | first half of the inner product, is from 0 to Pi over 2 |
|
|
25:31 | . Uh And then you do the half. And the way I did |
|
|
25:35 | second half of this line is I the same summation index. But then |
|
|
25:42 | the summation expression I replace jay from previous language, A plus P over |
|
|
25:51 | . So that covers the second part the summation range for the inner |
|
|
25:58 | So now I can use the properties this omega piece that when you take |
|
|
26:06 | the power of P over two is . So that means in this submission |
|
|
26:15 | You have -1. And I guess an excellent sorry, extra crime here |
|
|
26:23 | shouldn't be so I just discovered that . So I apologize. So, |
|
|
26:31 | and then so I just added these guys made it just one summation sign |
|
|
26:39 | it has the omega lP, that this part. Um And it's the |
|
|
26:46 | in the first half of the inner . And the difference for the second |
|
|
26:51 | is that this guy multiplies the expanse for the second part of the |
|
|
26:59 | so I can play uh for the here, someone consider also in an |
|
|
27:05 | . Values and will become clear when doing that in a little bit. |
|
|
27:11 | now if I look at the and even so this is best for the |
|
|
27:21 | here is the even ones and this be the odd ones. So if |
|
|
27:28 | , plug in uh to our prime of L. In the so upper |
|
|
27:36 | a quick slide equation. And then it's an even power of L then |
|
|
27:46 | to an even power is always plus . So then this expression simplifies to |
|
|
27:54 | part and for the other half the is odd. So then Basically |
|
|
28:02 | So in that case you get essentially expression and the other part that was |
|
|
28:11 | here since we have Formula P 2 crime one point as well. Um |
|
|
28:23 | at this six. Turn it into a repeat of the two as the |
|
|
28:30 | part. So remember um this I'm sorry, I'll need to flip |
|
|
28:38 | . Yes. Yeah. So here the armor to remove that. So |
|
|
28:46 | um if you have an even power this uh or kill me too, |
|
|
28:55 | can sorry if you have a two here then you can as well move |
|
|
29:04 | down and They would be over two the denominator, that's kind of what |
|
|
29:10 | exercises here. So at this point looks perhaps like totally meaningless manipulations but |
|
|
29:19 | not. Um So the difference why not is that we started out rested |
|
|
29:33 | this in a product here that is uh have p terms for every question |
|
|
29:43 | every L. So this is and enumerated for all the ills and we |
|
|
29:48 | this matrix vector application that we started . Now what they have here is |
|
|
29:59 | inner product. That is only of half feeling but we got two of |
|
|
30:07 | because there's one for all of them either. So it may not seem |
|
|
30:13 | much of again that I want to . So this is that it's not |
|
|
30:20 | have sites. So what's the Well, I guess first course, |
|
|
30:27 | on. So this is the notion the butterfly. So this is just |
|
|
30:33 | what's being done here. Right? to do each of these two have |
|
|
30:39 | . Dft is what's needed to be is to replace the input by um |
|
|
30:46 | these two input values here. Um the other part you have the take |
|
|
30:53 | difference between the same input values and you must apply it with this. |
|
|
30:58 | my God, jay a woman up to the power of Jack, this |
|
|
31:05 | , the Amadeus here in front is same invoked. So that's you cannot |
|
|
31:10 | ends up with the to do these slightly different input factors. One has |
|
|
31:15 | as an input factor and the other has the omega p to the power |
|
|
31:20 | times the difference as the imperfection. that's what's kind of illustrated on this |
|
|
31:28 | . So now you have the two and one can draw a picture of |
|
|
31:32 | and then it becomes known as saying . Uh people talk about butterfly competition |
|
|
31:42 | talk about butterfly networks when it comes interconnection networks and it's all because of |
|
|
31:49 | like the drugs relationship between the input the output there. So here is |
|
|
31:59 | of what we just did in terms um This generating the to have size |
|
|
32:10 | or uh yeah, so um and way we did it is that There |
|
|
32:20 | one for even components of the output there was one for the components of |
|
|
32:25 | output fractiousness run for the gun the X. The even ones and |
|
|
32:31 | there's the bottom one is the odd . So this is what was |
|
|
32:40 | And the first thing with all the kind of thing. And solid bullets |
|
|
32:46 | just this collection of butterflies. That the computation required to reduce it. |
|
|
32:54 | the requirement to to half size F. T. S. But |
|
|
33:00 | I can apply the same procedure over over again To these half size of |
|
|
33:09 | . So if one does that, ends up with something like this. |
|
|
33:15 | now we have and by doing recursive, lee dividing it while generating |
|
|
33:24 | size FTS that means after logarithmic number steps you're kind of done. So |
|
|
33:32 | why the depth in some sense of . FFT a butterfly network or stages |
|
|
33:40 | the competition this logarithmic and for each the number of operations is proportional |
|
|
33:49 | The number of data funds. So the way you want to insult. |
|
|
33:55 | the algorithm being of order peel api the mountain rotation. Today we're in |
|
|
34:03 | from using the normal thing. There's things to observe on this slide that |
|
|
34:15 | with the input and I normal border just taking the Integrates and you know |
|
|
34:24 | 0 to 1615 in this case for input values. But because of this |
|
|
34:32 | even splitting we respect and the it turns out that the output is |
|
|
34:44 | referred to as scrambles. Um So means for instance if we look at |
|
|
34:57 | we take the lines as actual memory so in the first memory location then |
|
|
35:06 | the output for an in place algorithm um zero frequency components is in the |
|
|
35:16 | location as the zero input value but next memory location that used to have |
|
|
35:24 | second input value. Now it has the second frequency component but Frequency Capone |
|
|
35:34 | eight and it turns out that this scrambled order or lucic unscrambled order is |
|
|
35:44 | fact well defined as every traverse order I will talk more about that before |
|
|
35:53 | done and I'll probably I'll stop at point here and but let me see |
|
|
36:00 | my next flight is first. Um , so let me just make some |
|
|
36:09 | on this. And this is particularly When one does distributed the fifties and |
|
|
36:24 | it is looking at these so called the factors for roots of unity. |
|
|
36:32 | in this particular celebration did in the butterfly stage here, the roots of |
|
|
36:44 | corresponds to is slicing of the upper of the unit circle into peace |
|
|
36:56 | We only need half of it because used this property that in this decimation |
|
|
37:06 | frequency algorithm that they looked at data in the input, that is half |
|
|
37:13 | the input factor away from the So those twiddle factors, the fact |
|
|
37:20 | replaced by a minus times the total on the upper half of the unit |
|
|
37:28 | . Now as we proceed to the stages than data was of the half |
|
|
37:35 | size and there were two of them each half size. Fft used the |
|
|
37:42 | uh that the federal factors of the half it doesn't know whether it's other |
|
|
37:47 | input data or what not. This knows about the size and since the |
|
|
37:54 | now is half of the input. in terms of the power of the |
|
|
38:03 | subjects set the twiddle factors. There's half as many slices or in terms |
|
|
38:10 | exponents on his twitter factors. It go 123 anymore, but it goes |
|
|
38:17 | 0, 2, 4 and basically twice the value of the twitter |
|
|
38:26 | Exponent in the 1st stage. And tell them what you're saying is also |
|
|
38:33 | means the people of factors used in second stage, it's a subset and |
|
|
38:39 | every other people in fact produced in first stage and then it goes further |
|
|
38:45 | . So then it becomes around only two unless every other of the |
|
|
38:50 | stage. So, and in the it's just a very simple bath |
|
|
38:59 | Yeah, So one more comment and I'll read through my comments under the |
|
|
39:06 | factor setting. So, um what and fft applications, one does many |
|
|
39:18 | f t s of the same size that's so that's what we typically |
|
|
39:28 | You compute the trader factor once and you re use them for every Application |
|
|
39:36 | 50s of the same sex. So an area where you included for the |
|
|
39:44 | that computation in your code, but not a good way of doing it |
|
|
39:50 | . So you want first to uh , compute them and spend a little |
|
|
39:57 | of memory or the memory to um them and then just use them every |
|
|
40:07 | of the fft and that's where the is important to know how these things |
|
|
40:15 | in the context of distributed FTS because typically different little factors are allocated in |
|
|
40:24 | nodes of the cluster and you want be able to make sure that they |
|
|
40:29 | in the right place for where they needed. All right. And this |
|
|
40:36 | pretty much what I read is. , but they did not say is |
|
|
40:43 | . Well kind of if one has computed the total factors and store them |
|
|
40:49 | an array Because of this part or property on the cool. It took |
|
|
40:54 | of 50 one can use the address to figure out what the exponents are |
|
|
41:00 | this is simply what um, is the bullet points here says that it's |
|
|
41:08 | step off the most significant bit and into the race um, when you |
|
|
41:16 | to retrieve the proper total factor, questions on that before I talk about |
|
|
41:24 | decimation in time. Okay. So is also This otherwise decimation in time |
|
|
41:40 | tend to be the one that shows most of the time in textbooks about |
|
|
41:44 | 50th as well as often in some people develop efficient algorithms were used |
|
|
41:52 | some competition. So here is now hmm. Wherever one looks since that |
|
|
42:02 | even input it. So in this the input factor x is petition |
|
|
42:11 | into looking at the even components in first submission here And help some a |
|
|
42:19 | of -1 such woman peanut yet. sorry about that. And then the |
|
|
42:24 | components of the input factor. Um then, so this is just again |
|
|
42:32 | inner product but rewritten in for even odd components of the input factor and |
|
|
42:40 | the same game. Uh Bye have . Uh Okay prime them being two |
|
|
42:47 | promptly even so I can take these and move it down to uh from |
|
|
42:55 | know from the top to making a . Over two instead. Someone now |
|
|
42:59 | kind of half size in a product also use some fiddle factors that are |
|
|
43:05 | half size half Ephesians. All So this is Oh just but now |
|
|
43:09 | -1 makes sense. Hopefully mm. And now one looks instead of 1st |
|
|
43:18 | 2nd half of the output vector, capital. So under the decimation of |
|
|
43:26 | they love that input their points halfway and not even put values. Now |
|
|
43:38 | at what even input values and um values separated about half of the number |
|
|
43:46 | outputs families. Otherwise it's kind of in the manipulation. It's very |
|
|
43:54 | So if one does it this way me kind of derived the application of |
|
|
44:03 | divide and conquer scheme. Starting with output and now we used half |
|
|
44:12 | Fft is based on even or odd values but it's still kind of a |
|
|
44:18 | competition. The butterfly looks a little different in the sense that the complex |
|
|
44:25 | happens before the actual but if I on the decimation in frequency that happened |
|
|
44:31 | the output, one of the outputs the butterfly. So it's not particularly |
|
|
44:37 | but it's very similar and then one again continue to do the same exercise |
|
|
44:43 | and then man get something like So now, you know, we |
|
|
44:50 | with basic derivation from the output that had a normal order and then figure |
|
|
44:58 | this obvious split that happened in every step. It in fact resulted in |
|
|
45:05 | the kind of our input yes, to be in a bit traversed order |
|
|
45:16 | order for the output to be in order. There is a a few |
|
|
45:28 | things to comment on and I don't exactly my sights here but to be |
|
|
45:35 | here is now for this decimation in derivation all the twiddle factors spent along |
|
|
45:45 | After a 1/2 circle is used for last butterfly stage instead of the |
|
|
45:54 | So the kind of order in which title of factors are used is reversed |
|
|
46:01 | to the decimation in frequency. They're and that's kind of one important fact |
|
|
46:12 | in particular in the parallel computing or fFT competitions otherwise. So I think |
|
|
46:26 | so this is just script mentioned that and here's kind of but I also |
|
|
46:32 | talked about and harmon then use the address or in stage number to figure |
|
|
46:42 | uh huh how to index into a of physical factors. Let's see what |
|
|
46:51 | next time is. I guess it's the two that's already done in talking |
|
|
46:58 | them. Um So um the way derived it for the decimation in frequency |
|
|
47:12 | they input was a normal order and the output being a bit reversed |
|
|
47:20 | Um The things were totally opposite for decimation in time And I also commented |
|
|
47:27 | Howard 20 factors are being used now the fingers, right? The I |
|
|
47:44 | to talk about the last comment on slide, I think on the next |
|
|
47:51 | . So this basically showed that the on the left hand signers to instances |
|
|
48:05 | the destination of frequency on this there is the left side that is |
|
|
48:09 | what was previously in the derivation of Decimation of frequency at 15 but it's |
|
|
48:17 | fine to kind of commute the lions you like in the but if I |
|
|
48:29 | so then what happens if you were do that? Which is essentially to |
|
|
48:33 | well instead of having the import vector to memory in normal order, if |
|
|
48:39 | assign the values to memory location and d traversing the index of the input |
|
|
48:48 | , then you would get the output normal order. So, so the |
|
|
48:55 | of this is to say, well is what is often not pointed |
|
|
49:02 | is that one could as values the in frequency, as a decimation in |
|
|
49:11 | . If the data presented as its in vitro first order and the |
|
|
49:21 | stand up also for the decimation in even get the output in normal. |
|
|
49:32 | another point that I wanted to make I haven't been before. So whether |
|
|
49:39 | looks at the left slide, the butterflies combines input values that are half |
|
|
49:49 | apart. X zero and X. . Is part of one. Butterfly |
|
|
49:54 | and 9 is the input to another . And that obviously doesn't change in |
|
|
50:00 | of this, the allocation of memory the competition needs to be the |
|
|
50:06 | So in the first stage and the and fire and frequency one combines in |
|
|
50:16 | buffet computation input values. There are the input director apart from each |
|
|
50:27 | Now the fun. Alright. Uh I serve now if you look at |
|
|
50:32 | decimation in time left his side is what Peter ended up within driving the |
|
|
50:40 | in time. 15 and the same . Yes. To the traversal of |
|
|
50:47 | assignment values, the memory location and get the right hand picture that shows |
|
|
50:53 | . So if you have there in order has important Eucharist values the destination |
|
|
50:59 | tiredness. The decimation in frequency. the consequences that now the output to |
|
|
51:04 | traverse for So they're both are perfectly to use regardless whether the input orders |
|
|
51:14 | and be traversed. The other point again like I pointed out for the |
|
|
51:22 | in frequency on the previous slide. even for the decimation in time. |
|
|
51:27 | the first stage you combine input There are half of the infant sector |
|
|
51:36 | . So the set of data values are combined as you go through the |
|
|
51:41 | of the FFT follows the same Whether you use decimation in time or |
|
|
51:46 | in frequency. That's another part that important to understand. And so the |
|
|
51:57 | reference pattern does not change whether you decimation in time, the decimation in |
|
|
52:03 | , it depends if the input order normal or be traversed but the combination |
|
|
52:13 | input values that is used in the of our stages. This is the |
|
|
52:19 | . Yeah the butterflies are slightly So how the trailer factors are used |
|
|
52:25 | different but it's the same fiddle frankness just the order in which their use |
|
|
52:32 | different. Yeah, so that um just to comment on this and then |
|
|
52:43 | will stop so just to be kind obvious um just illustrating what the controversial |
|
|
52:53 | in terms of having a binary representational um and it just best there so |
|
|
53:03 | look at the binary digits in the cold column and you look at in |
|
|
53:11 | bitter for asbestos, if you write from left to right then it was |
|
|
53:16 | of write them down in the opposite when you do to be controversial. |
|
|
53:19 | nothing more to it than that. it's very well defined what it |
|
|
53:23 | Um It's yes uh kind of do reflection on the bits around the need |
|
|
53:34 | So now comment on this and then was gone again and see if there |
|
|
53:40 | questions. So the victory reversal is that has received a lot of interest |
|
|
53:54 | again, interferes in one of the common algorithms out there and sometimes people |
|
|
54:02 | to have the same input and output . That means we need to add |
|
|
54:06 | stage that restores the order that was in the input and that is the |
|
|
54:15 | . Now people also want to not have temporary race to have So they |
|
|
54:22 | to do it in place like they to hear 50 itself and then it |
|
|
54:27 | somewhat tricky. So there are many um about how to do the traverse |
|
|
54:37 | in place to avoid a large amount temp storage and their papers there's also |
|
|
54:46 | book on for fast fourier transform but band on uh on a university that |
|
|
54:53 | a very good reference for both the fifties. The other point to be |
|
|
55:00 | of As a woman look how clusters Mandelson distributed their 15 then Yeah, |
|
|
55:09 | type of fermentation that the V traversal often is our process a lot of |
|
|
55:19 | in the network, if it it tends to be the case that |
|
|
55:25 | transposition does. So if anyone is in trying to figure out the capabilities |
|
|
55:31 | the interconnection network in a cluster uh to us you two tried the controversial |
|
|
55:41 | matrix transposition and find out what one may work well And the other |
|
|
55:46 | may not and it's also good if buy something, it's a good benchmark |
|
|
55:52 | or acceptance criteria. If you want test what the vendor promised in terms |
|
|
55:57 | their network and I will stop before on to the next. Okay, |
|
|
56:14 | . So mean in some cases that's yes to do say informant and |
|
|
56:33 | So like in the case of that's good a wrong application. I |
|
|
56:43 | about to say in terms of Yes To do sort of the forward |
|
|
56:48 | in order to find out how many coefficients you need to kind of or |
|
|
56:54 | enough to represent the image that sort compression that happens reduction from the number |
|
|
57:00 | pixels in the example I talked about the number of fourier coefficients but then |
|
|
57:05 | probably like to restore the image and you're going to do the inverse |
|
|
57:12 | Um in terms of the M I. That also mentioned, that's |
|
|
57:21 | application. It turns out data that get from the MRI scanner as it's |
|
|
57:34 | . The input data is actually in full year space. So they get |
|
|
57:38 | image you need to run an enormous transform and that may be all that |
|
|
57:45 | want because you may want to understand pictures of their organs that are trying |
|
|
57:51 | image and then then you may work the image space. Now in terms |
|
|
57:59 | the american methods, it turns out people often transforms, It's a geometric |
|
|
58:08 | or time into the for your domain then it turns out manipulations or algorithms |
|
|
58:16 | faster to use in the four year and then eventually I should have done |
|
|
58:22 | . You want to come back to space in which you started. So |
|
|
58:29 | quite common at least in terms of and engineering numerical methods that You use |
|
|
58:39 | forward and inverse f. 50s in . Yeah. And this slide basically |
|
|
58:45 | that if that's the case, one necessarily need to bother about trying to |
|
|
58:53 | order. So the traversal is built and the reversion implied in the |
|
|
59:02 | But as long as you know where data is, if you do a |
|
|
59:06 | followed by an inverse transform the result the end is that the output is |
|
|
59:12 | the same order as the and so why it's good to do if one |
|
|
59:20 | want to see if the coals or else. There is no need necessarily |
|
|
59:27 | um including the traversal stage, Uh when we're getting the index memory |
|
|
59:39 | we use a compliment context instead of the so I was a little bit |
|
|
59:47 | Some NBC if it gets clear. , the compliment. So let me |
|
|
60:01 | the next line maybe see if that's the question is or um so there |
|
|
60:10 | no beach worse unnecessary. So um come back to this. So if |
|
|
60:16 | do. Okay. Yeah. So Oh okay. So it was |
|
|
60:29 | think that you said the compliment. that what I heard? That's right |
|
|
60:36 | compliment. In what sense? I if you're taking 40 so if you |
|
|
60:43 | a complemented it's very like all ones it'll be like a mirror image but |
|
|
60:48 | a different restaurants. Um So so you mind repeating and see if it |
|
|
61:01 | through more clearly? Um What is is rather than the worst thing of |
|
|
61:08 | ? Uh And you simply just flip bits Make a 01. Does that |
|
|
61:17 | not change any? Does that change alphabet? Um Okay I'll see if |
|
|
61:27 | following answers the question so mhm. point on this slide is essentially you |
|
|
61:34 | need a bit reversal if you do both. Now this line was you |
|
|
61:38 | the input in normal order and that eventually you get Results of the two |
|
|
61:43 | combination. Also in normal order. you have the bit traverse input order |
|
|
61:48 | works as well find them the middle the normal order. And if you |
|
|
61:52 | the uh members of fatigue to that you get back to Big river |
|
|
61:58 | That's that was the question. Right we now I don't think so. |
|
|
62:19 | question I think if I understood correctly rather than doing the reversal, one |
|
|
62:27 | the processes that showed up um if the reverse and so on. Okay |
|
|
62:37 | sorry the reversal of the data No it's simple. Let's simply just |
|
|
62:50 | the bits rather than reversing the order make a 01 and change 1-0 rather |
|
|
62:57 | reversing the order. Uh investor. . But even so doing in the |
|
|
63:10 | part to bring in a lot of for different limits. So you can |
|
|
63:18 | manipulate in the seas and whatever way want. But you cannot change the |
|
|
63:27 | items. That needs to be combined each stage. So it's important. |
|
|
63:35 | possibly index of the values that are by the numbers or that's just the |
|
|
63:46 | are there to illustrate a binary encoding the integers and the data then lives |
|
|
63:58 | memory locations. And there's a question retrieving getting the right address is to |
|
|
64:06 | the proper data. That should be in the butterflies. And if you |
|
|
64:12 | it in place you have no choice the output then corresponds to in |
|
|
64:20 | C. So after you've gone through the records of stages, in order |
|
|
64:25 | understand which frequency component in this case capital X. Is in that memory |
|
|
64:34 | . You need to basically do a reversal on the address in which it |
|
|
64:42 | , then you will find which frequency is in that memory location. |
|
|
65:02 | I'm gonna be after this. So you don't do it in place, |
|
|
65:06 | means you use temporary storage. So each butterflies state you choose to write |
|
|
65:13 | else in memory. Then you can what's known as self sorting and |
|
|
65:27 | So that means inherently do. Additionally in Prima Titian's into the stages in |
|
|
65:34 | FFT and the self sorting in place tricky but they can be done but |
|
|
65:40 | you built in the equivalent of the reversal into the various stages On the |
|
|
65:48 | 15. So it's important to think terms of a data items that are |
|
|
66:01 | combined and where the lives in memory figure out not to address memory to |
|
|
66:07 | the right values into the right panel . And this slide just assumes that |
|
|
66:14 | have an array in memory in which is input data and in this case |
|
|
66:20 | take the normal and the bit reverse it if you have other ordering of |
|
|
66:25 | input data you need to make sure your address is such that you're all |
|
|
66:31 | access input data. That in terms the input data ordering is halfway through |
|
|
66:40 | sequence of part, depending upon the of the sequence. And then you |
|
|
66:46 | a different map to memory if you use normal or be Trevor story |
|
|
66:53 | So important to distinguish uh the combination input data and needs to take place |
|
|
67:04 | where they live and in this case really simple for having them in this |
|
|
67:10 | the input data. Okay. his addresses and memories from 0 to Oh |
|
|
67:21 | that help the question bomba? somewhat we can talk tomorrow one. |
|
|
67:32 | was saying that what the question was here you reverse the bits of the |
|
|
67:41 | or find the index in extras. um I'm a little bit uh hesitant |
|
|
67:52 | say that I reversed the bits because order to get understand what component is |
|
|
67:59 | bit in a particular memory location, need to reverse the bit of the |
|
|
68:09 | . President, the compliment devastates if did a compliment then the complement of |
|
|
68:16 | bits is not the same. You if you complement the address, did |
|
|
68:23 | get a different number than if you traversed the number if that's the |
|
|
68:31 | Yeah that's kind of what? Right If you take the 1st 001 and |
|
|
68:41 | a bit complement of that, you 1110 which is not type and that |
|
|
68:50 | the question. Okay, shall I , yep okay sorry Yeah and I'm |
|
|
69:04 | to answer when I'm back from the is easy but so this was the |
|
|
69:09 | that often do the inverse one doesn't to do the bit reversal as long |
|
|
69:15 | you know where the data lives because need to apply the right set of |
|
|
69:19 | factors that the combination would do Okay, let's see what happens |
|
|
69:25 | Mhm um Yeah so this is just different. So we found that um |
|
|
69:39 | Embers 50 renters just the members of total factor. So that's fairly simple |
|
|
69:48 | now let's see decimation in time has same property. So again, I |
|
|
69:52 | the example here, normal order input then things are d traversed and then |
|
|
69:58 | chris towards the order. So that's same. Um Now this slide is |
|
|
70:10 | trying to point something else out and is that if you and this is |
|
|
70:19 | particular for distributed Fft where this light important. So if you think of |
|
|
70:31 | Partitioning the input vectors that put four In each of four notes and you |
|
|
70:38 | the first zero for three values, a one note than the 4-70s and |
|
|
70:42 | different note. So you kind of uh lines across here. Then if |
|
|
70:52 | then look at the twiddle factor, values of things within the little square |
|
|
71:00 | . As you go through a line the forward to the members, you |
|
|
71:06 | see that effectively the same twiddle factors so being used in the same mold |
|
|
71:15 | the forward and the inverse. So fiddle factors are now in the right |
|
|
71:20 | , but it only applies if you the combination of the summation in time |
|
|
71:23 | decimation in frequency. Otherwise it does . So that's part of the reason |
|
|
71:30 | I said early on that for distributed 50s, it's essential to know about |
|
|
71:37 | forms of decimation in time with this frequency in order not to have to |
|
|
71:43 | with the factors or also communicate twitter or re compute that. So this |
|
|
71:51 | the most efficient type of combination and doesn't matter which one is done |
|
|
71:56 | Uh decimation in time and decimation in , but one of them should be |
|
|
72:02 | in time and one should be decimation frequency. Okay, let's see what |
|
|
72:11 | asked. Try to get to before time is up. So this is |
|
|
72:18 | just saying what I already said. now I wanted to document this once |
|
|
72:24 | some of you, What do you ? A 15 projects? And since |
|
|
72:30 | course is about making efficient use of , memory, hierarchies are particular significance |
|
|
72:40 | this point. It's important for So I had a few minutes and |
|
|
72:46 | love Okay pointed out. So now is this higher ratings topic that comes |
|
|
72:55 | mentioned readies for an eight early on terms of Cooley Tukey fft and the |
|
|
73:02 | to use the properties of the title factors that go halfway around the the |
|
|
73:07 | circle. It is um The power metal becomes -1. So basically the |
|
|
73:17 | any or meter in the upper Yeah. And you can go halfway |
|
|
73:24 | away from that and you have the Value except it's multiplied by -1. |
|
|
73:30 | that means we only needed half of trail of factors and can replace the |
|
|
73:36 | ones but taking the upper half And deployed on mid 90 swan. There's |
|
|
73:42 | one that is very convenient and that actually go a quarter way around the |
|
|
73:47 | circle um january reached the imaginary one I. So that's the particular |
|
|
73:57 | It doesn't require a and they're real . Multiply with high this change is |
|
|
74:03 | real and imaginary components. So that's basis for the Israelis for tight fifties |
|
|
74:15 | I have about five more minutes to tell you this is a bunch of |
|
|
74:21 | equation that looks messy and probably impossible comprehend by staring at in a |
|
|
74:28 | But you can see there are four that is basically going around in this |
|
|
74:33 | taking the input factor, young dis frequency instead of doing an index and |
|
|
74:40 | halfway the vector length you go a way the vector length people before and |
|
|
74:46 | two people before and flip you before then you do the same thing with |
|
|
74:52 | to the output and not just look than even but there's still a quarter |
|
|
74:59 | ah separation um Sorry, four sets basically four instead of two. Or |
|
|
75:12 | now you cover the range to the Multiple of four by adding 1, |
|
|
75:17 | and three. So if you do well just quickly give you the point |
|
|
75:25 | and in order to give you what the point of all of this |
|
|
75:29 | So you get sort of now ratings . But the fine looks like this |
|
|
75:35 | as you can see in this case is effectively four and radios to butterflies |
|
|
75:43 | in this rating for butterflies. Um this ratings for there is a multiplication |
|
|
75:51 | -1. That is just simply flipping and imaginary components with the proper |
|
|
75:57 | And so now you have At the three complex multiplication with privilege factors. |
|
|
76:08 | we had one complex multiplication from each of the four multiply. So for |
|
|
76:16 | so now we basically have instead of complex multiplication in the ratings too configuration |
|
|
76:22 | we have three instead of four. we say one complex multiplication And working |
|
|
76:29 | Great X four but that's not actually most important part. I will skip |
|
|
76:35 | , you can look at it. and this is this mission time, |
|
|
76:41 | can do the same thing. Um then you can even go to reading |
|
|
76:47 | but when you go a quarter way uh the unit circle. Yeah, |
|
|
76:52 | you need some numerical or uh arithmetic you end up with having a square |
|
|
76:58 | win there. So it's not quite cheap anymore. Um it does have |
|
|
77:04 | real foreign point operation but here's pictures it and I'm trying to get here |
|
|
77:11 | to the french sign that is important try to optimize performance and um so |
|
|
77:20 | the upper kind of table, refers to each one of the Butterfly |
|
|
77:28 | , Israelis to foreign eight. How ads? Some of subtractions multiplication |
|
|
77:33 | And how many storage references is and point is then that the idea is |
|
|
77:39 | each one of these butterfly competitions fits fast storage so you just retrieve input |
|
|
77:48 | output data for the butterflies. You have to do any um right thing |
|
|
77:59 | loading from memory to carry out uh really ate a butterfly for example. |
|
|
78:05 | all loaded in what? So if do that one can see that in |
|
|
78:11 | of storage references, if you look the right thing, column going from |
|
|
78:17 | to turns ratings. eight Reduces the access by the more than a factor |
|
|
78:24 | two. It also says a little in terms of the number of operations |
|
|
78:31 | in that case it goes from five roughly four. So it's about 20% |
|
|
78:37 | and arithmetic but it's more than a to the memory references. And since |
|
|
78:43 | references is usually the bottleneck and most systems today, That's why during this |
|
|
78:52 | ratings of 50 are critical and most use them and they the extension is |
|
|
79:00 | won best. It does a red . That is as large as you |
|
|
79:06 | do. That fits in the cash , level one cars and then level |
|
|
79:11 | cars and the three cash. Um means that the look for as you |
|
|
79:17 | from level one to level two, ratings is larger uh etcetera. And |
|
|
79:23 | for the same concept as for the matrix multiplication I talked about last time |
|
|
79:31 | now I'm sure that my time is . So I take questions and I |
|
|
79:38 | stop here and they have been through in detail next time the rest certain |
|
|
79:51 | that I have today, you can some print the picture and flip them |
|
|
79:58 | very quickly here. So sorry it's there as well, you can see |
|
|
80:03 | happens. So in this case the axis is performance in some sense and |
|
|
80:09 | two horizontal access is memory strikes for and output. As you can see |
|
|
80:16 | the strikes gets larger, then the drops and high jobs depends on the |
|
|
80:23 | cache policies for the particular processor that have. But keeping things small in |
|
|
80:32 | definitely has a significant impact on the and I think there was another sign |
|
|
80:39 | that shows the range, the blue , it's kind of bad performance in |
|
|
80:45 | of being put them up, put and kind of brownish reddish bars and |
|
|
80:51 | all that is when you have really strikes and managed to make good. |
|
|
80:55 | first use of cashes and the green are coming on the average for this |
|
|
81:02 | but so it's a huge difference but are slicing, it can look through |
|
|
81:07 | and you're gonna get an idea what is good, good if no |
|
|
81:17 | I will stop, you have to sharing. I was stopped recording. |
|
|
81:23 | huh uh huh I don't know if lost to Russia where along the way |
|
|
81:31 | I think uh stop |
|