ICS Video Player

COSC6365 Intro to HPC Fall 2021 - Lecture25_FFT

Transcript ×

Auto highlight

Off

Font-size

00:02	Okay, so I will talk about fourier transforms today. And the reason

00:13	that One of the teams when I to do a project on the 50's

00:20	huh. As the acronym is for fourier transform. And uh my understanding

00:29	that the team is adventurers and don't a whole lot of background in terms

00:37	their 50s. And it's um one the most widely used computational procedures or

00:47	of anything. So I'll decide to about it and I'll try to done

00:54	once I described what it is, of the techniques were used to try

00:59	get good performance on processors. please. I love all right.

01:11	there's kind of an outline of So first introduced the discrete fourier

01:18	that is the basis for the fast transform and then look at how F

01:28	actually is working or put together. and mhm That's pretty much it.

01:41	motivation. So why one should be and quite important in many disciplines are

01:50	their 50s. This is an example original wide range of different types of

01:58	where it's used. So it on left hand side is kind of electron

02:05	microscopy that is basically dealing with They are highly very noisy images as

02:18	thing with the young go circle shows it looks like just noise but it

02:24	. And to try to make something out of these. I didn't always

02:27	images. A 15 placing key role eventually one can actually reconstruct images of

02:37	like the one at the bottom on left hand side. Another part of

02:43	you may be involved with colleagues in department they are interested in it,

02:51	resonance imaging or admire are I for uh in that case uh 50 is

02:59	the critical role because that's the way actually managed to get an image of

03:05	going on on or the structure of brain. Others are kind of more

03:11	the single processing domain and just show examples here in astronomy that tends to

03:17	very large 50s and billions of data and then on the right hand side

03:26	is what's typical for The use of . 50s and American competitions for solving

03:37	forms of for instance fluids problems are finding out what goes on and some

03:43	to make. And the slide is to illustrate that F. F.

03:53	. S. And also were used a very long time to do

03:59	So it turns out that the four basis with just two econometric functions like

04:07	and co signs are very good set basis functions as it's known in many

04:17	image processing and all some of This isn't solving Pds and many types

04:25	functions can be represented as a serious sign or co some functions and this

04:34	of shows the case of the compression upper left them E. Is an

04:44	of a bunch of pixels and then rest of them from B.

04:49	D. and E. And Is um representing sort of the fourier

04:56	on that. A set of pixels the pictures of the IV. And

05:04	upon how many components of the four serious one users from yet the not

05:11	great representation of E or one can get pretty good representation of the for

05:17	lot less than just communicating or using the pixels. So instead of during

05:25	pixels and it's not clear from the slide weather 10 24 is the size

05:35	civilian pixels that probably what it is represent the so in this case instead

05:41	a million pixels and he is pretty than Earth is indeed a better representation

05:47	E. And for F. Arizona 400, coefficients that pretty much gets

05:55	good the resolution as the pixel So that um and what the direct

06:03	that was used in the mpeg and play standards or what's known as sign

06:09	coastline transforms. Um Now therefore for transform, but it's based on before

06:16	transform, so that was a motivational or of four FTS. Um So

06:29	we talk about the discrete fourier transform the discrete fourier transform is simply a

06:35	version of The four Year Transform or trying to do before we transform that

06:43	kind of defined on the top, is based on an integral of a

06:48	that have F of X. And are these four year basis. The

06:56	and cosine is often represented by Roots unity as they're called or E to

07:01	-1 something. So so that leads to the discrete fourier transform and I'm

07:17	going through here, how you go the continuous domain to the discrete

07:22	So we just have to take it face value that if you do it

07:26	in America one ends up with is ID for your transform that is in

07:32	middle of the image and sorry for rotation. So um for the bottom

07:39	of the image F of X is with um lower case X. So

07:50	the E to the minus. And yeah. Oh my God max

07:55	replaced by this or Madam piece. that you can see that has some

08:05	, what's on top for the continuous analytical fourier transform. So um the

08:18	the collection that discreet from your transform for each capital X. Indexed or

08:26	em the right hand side is simply a product as you do that product

08:35	this over the piece. But the values to sample some one has the

08:42	then um it is simply in the and then and repeat that for a

08:53	of and values that covers A certain from 0 to the mine is wonderful

08:59	enumerated again discrete values. So it's social in the next side it ends

09:09	being simply made to expect an the inverse discrete for a transform is

09:17	very analogous as you can see it's uh maybe it's factored multiplication something so

09:26	that, let's look at that what matrix actually looks like. So the

09:33	is then the omega piece to the of M times J. So um

09:43	if M is zero, that is first drill in this matrix, The

09:50	of Omega P is zero regardless of . So that means um If the

09:57	zero then regardless of what the home that are than the exponential Haitian results

10:06	the wants, the first row is . It's also the case that the

10:12	column is a one because then Jay zero and that's a summation index of

10:19	semester is X. Of zero is multiply by one. So on her

10:27	um particular structure where now the other entries is for the second role Mm

10:38	one. So it as you go and through the inner product than the

10:45	exponents of the moment. A piece simply the j value. That then

10:51	from 1, 2 p -1 of different start in the first column from

10:55	to P -1 And then one has . Yes, plug in the

11:02	And the jenny, someone got the of this W matrix representing the uh

11:13	of unity or the omega keys in a discreet for your transform and the

11:25	of interesting property of the fourier transform the discrete fourier transforming them to be

11:33	. In this case said the matrix is simply the inverse of uh the

11:41	entries in the forward transform. And fairly easy to show that in fact

11:46	is true. Yeah, so that's it's quite easy to compute in verse

11:56	you transform if it actually has what's then called the forward transform. So

12:07	questions on the G. F. . And may be familiar to or

12:13	but it's not, you're welcome to questions If not, I will turn

12:26	one. So to the first floor transform that this thing that um used

12:36	extensively in many disciplines. So I even compete in my matrix multiply or

12:45	even it's more frequently Houston matrix So the first thing is just to

12:53	out that there is the cause of um popularity of this algorithm. The

13:01	comes from it competition all efficiency. unlike the Matrix electoral products, that

13:10	basically order P squared estimate tricks exercise times P before we transform is that's

13:21	proportion of the P times log P of P square, if the

13:27	probably nobody cares. But if p a billion or more points the difference

13:34	large as I will show on the slide. So the transport was actually

13:44	discovered by the all physicists and I uh later we discovered, I don't

13:54	the gap in years between the original the cool it. Okay. Um

14:03	on the festival you transform that somehow got the name recognition for this way

14:10	computing. The discrete fourier transform uh with the notion always can different radi

14:21	are I guess it's called so it's to four and eight and I

14:27	show you that and that's actually important understand because that's a property of deriving

14:34	algorithm that is the basis for getting efficient implementations of the Cooley to can

14:49	and the cool it to transform as will see, I want to talk

14:53	it is restricted to particular sizes of dataset or the input factor accident doesn't

15:04	for an arbitrary length factor. It fact only applies the power of two

15:10	factors so people and uh I think with it and come up with other

15:17	and which one is known as a race addict that can be adopted them

15:23	be more flexible in terms of input sizes. Another one is rader's algorithm

15:30	then there are particularly clever one for when the input factor is of a

15:40	length or can be factors in the of crimes and then continues this prime

15:48	over them. Um mhm So I going to talk about the quality of

15:57	. That's that's the simplest one. and probably no also for the project

16:06	someone you will do and there are trick Aries that is being played.

16:13	Remember from the definition of the discrete transform it had this video factors about

16:23	play the you can see here on top right hand corner. Let's eat

16:29	in this case minus to pay high where so that means the homemade apiece

16:35	complex numbers. So yeah. it turns out that the input

16:47	mhm real and not complex numbers then can actually take advantage of that To

16:55	about a factor of two in operations memory. And I may be able

17:00	make some comments on that before the is over and then there is further

17:07	uh efficiencies that can be again, the data has certain symmetry properties and

17:19	it reduces more or less by another of two in terms of work.

17:25	And on Houston so called signing coastline and that's as I mentioned, was

17:32	basis for compression algorithms that jpeg and standards for many years now I want

17:38	switch to using way that transforms into and the other two things. Self

17:48	and in place. Self sorting a bit obscure at this point but it

17:53	become clear as I talked about it place simply me in some fun um

18:02	allocate separate members for input and So which would kind of double the

18:10	requirements and if you have a billion transforms doubling it. It's kind of

18:16	necessarily what you want to do. that's what's called in place algorithms.

18:23	they worked in an array that starts the input data and when you're said

18:29	done it has the transformer, the data without using much of temp

18:40	So here is just the pointing out the means to do something that is

18:46	important benefit or the p squared versus P log p. So as I

18:54	, uh astronomers are the ones I I'm not use not just the one

19:02	to 2 30 is a billion points they use many billions of points

19:09	I happen to have work with some dana way way back In the 90s

19:18	at that point already did multiple being transforms and the difference between square and

19:29	is quite significant as you can so and the number five instead of

19:38	L O P will be fine So that's in fact what it is

19:44	call it. Take the transform. now I'm going to start to look

19:55	how this FFT is can be the and get some insight into what it

20:02	and what the cleverness is but I'll for a second before diving into detail

20:10	there are questions it's not. So So there are two versions of the

20:28	it okay, there are many Russians in terms of how they kind of

20:35	in how the memory references kind of . Yeah. Typically called the decimation

20:44	frequency and decimation in time. And will show both of them the most

20:53	the literature from anyone reads papers or know it. They tend to favor

21:00	decimation in time and they don't mentioned in terms of decimation in frequency.

21:07	it turned out it turns out that terms of parallel computing in particular if

21:14	do large FTS on clusters, it's to understand both. And hopefully pointing

21:27	out before the lecture Is over why should know both. All right.

21:36	yes, now I'm trying to show have to There, I know what

21:41	basis is for their 50 and how get from something that is or the

21:47	square to something that is full of . So it all depends on the

21:53	that this matrix the w matrix the of the matrix or interests uh unique

22:05	and that's what's being exploited. And first formula transformed. So we'll talk

22:14	how to exploit it. I'll start the thing that's true results in the

22:23	and supposed to that the destination in . So for the decimation in frequency

22:30	thing is to observe as this particular of once known as the twitter factors

22:38	the roots of unity the omega p we're probably most of the trunk.

22:45	the notion of twitter factors And that uh huh. Talking about the end

22:51	the piece and the notion why they're roots of unity is illustrated on the

22:58	hand side. So which is a of the unit circle in the

23:03	The main and the omega piece basically this unit circle and the Lord of

23:12	P. Is the finer or the the angle is for each one of

23:18	slices. Remember that on a happy it to the -2? I divided

23:26	peak. So Pete tells you how uh slices there are as you go

23:32	the human circle. No, since omegas are A to the -2 pi

23:44	over P. Um When you take power omega p To the people of

23:55	two power. Since I should have that on the slide. Sorry.

24:00	So e to the -2 Parts are about 30 times. T over two

24:07	simply E. To the minus um hi B. Two. So that

24:17	you go half away around the unit , So that part of our monarchy

24:25	simply -1. So because you think as you increase the power or made

24:34	p, you kind of add one at the time and P over

24:39	you're halfway around the unit circle. um as you keep increasing the power

24:45	you get that's the one. Now the decimation of frequency one can look

24:52	the sum that they had, that basically take a roll of the dft

25:01	, that is the inner product. the inner product then is the omega

25:08	too. The power that is a of withdrawal you're working on. And

25:15	for the, going through the columns the matrix and multiply it by the

25:20	factor. So if you do you can split up the sum In

25:25	first half of the inner product, is from 0 to Pi over 2

25:31	. Uh And then you do the half. And the way I did

25:35	second half of this line is I the same summation index. But then

25:42	the summation expression I replace jay from previous language, A plus P over

25:51	. So that covers the second part the summation range for the inner

25:58	So now I can use the properties this omega piece that when you take

26:06	the power of P over two is . So that means in this submission

26:15	You have -1. And I guess an excellent sorry, extra crime here

26:23	shouldn't be so I just discovered that . So I apologize. So,

26:31	and then so I just added these guys made it just one summation sign

26:39	it has the omega lP, that this part. Um And it's the

26:46	in the first half of the inner . And the difference for the second

26:51	is that this guy multiplies the expanse for the second part of the

26:59	so I can play uh for the here, someone consider also in an

27:05	. Values and will become clear when doing that in a little bit.

27:11	now if I look at the and even so this is best for the

27:21	here is the even ones and this be the odd ones. So if

27:28	, plug in uh to our prime of L. In the so upper

27:36	a quick slide equation. And then it's an even power of L then

27:46	to an even power is always plus . So then this expression simplifies to

27:54	part and for the other half the is odd. So then Basically

28:02	So in that case you get essentially expression and the other part that was

28:11	here since we have Formula P 2 crime one point as well. Um

28:23	at this six. Turn it into a repeat of the two as the

28:30	part. So remember um this I'm sorry, I'll need to flip

28:38	. Yes. Yeah. So here the armor to remove that. So

28:46	um if you have an even power this uh or kill me too,

28:55	can sorry if you have a two here then you can as well move

29:04	down and They would be over two the denominator, that's kind of what

29:10	exercises here. So at this point looks perhaps like totally meaningless manipulations but

29:19	not. Um So the difference why not is that we started out rested

29:33	this in a product here that is uh have p terms for every question

29:43	every L. So this is and enumerated for all the ills and we

29:48	this matrix vector application that we started . Now what they have here is

29:59	inner product. That is only of half feeling but we got two of

30:07	because there's one for all of them either. So it may not seem

30:13	much of again that I want to . So this is that it's not

30:20	have sites. So what's the Well, I guess first course,

30:27	on. So this is the notion the butterfly. So this is just

30:33	what's being done here. Right? to do each of these two have

30:39	. Dft is what's needed to be is to replace the input by um

30:46	these two input values here. Um the other part you have the take

30:53	difference between the same input values and you must apply it with this.

30:58	my God, jay a woman up to the power of Jack, this

31:05	, the Amadeus here in front is same invoked. So that's you cannot

31:10	ends up with the to do these slightly different input factors. One has

31:15	as an input factor and the other has the omega p to the power

31:20	times the difference as the imperfection. that's what's kind of illustrated on this

31:28	. So now you have the two and one can draw a picture of

31:32	and then it becomes known as saying . Uh people talk about butterfly competition

31:42	talk about butterfly networks when it comes interconnection networks and it's all because of

31:49	like the drugs relationship between the input the output there. So here is

31:59	of what we just did in terms um This generating the to have size

32:10	or uh yeah, so um and way we did it is that There

32:20	one for even components of the output there was one for the components of

32:25	output fractiousness run for the gun the X. The even ones and

32:31	there's the bottom one is the odd . So this is what was

32:40	And the first thing with all the kind of thing. And solid bullets

32:46	just this collection of butterflies. That the computation required to reduce it.

32:54	the requirement to to half size F. T. S. But

33:00	I can apply the same procedure over over again To these half size of

33:09	. So if one does that, ends up with something like this.

33:15	now we have and by doing recursive, lee dividing it while generating

33:24	size FTS that means after logarithmic number steps you're kind of done. So

33:32	why the depth in some sense of . FFT a butterfly network or stages

33:40	the competition this logarithmic and for each the number of operations is proportional

33:49	The number of data funds. So the way you want to insult.

33:55	the algorithm being of order peel api the mountain rotation. Today we're in

34:03	from using the normal thing. There's things to observe on this slide that

34:15	with the input and I normal border just taking the Integrates and you know

34:24	0 to 1615 in this case for input values. But because of this

34:32	even splitting we respect and the it turns out that the output is

34:44	referred to as scrambles. Um So means for instance if we look at

34:57	we take the lines as actual memory so in the first memory location then

35:06	the output for an in place algorithm um zero frequency components is in the

35:16	location as the zero input value but next memory location that used to have

35:24	second input value. Now it has the second frequency component but Frequency Capone

35:34	eight and it turns out that this scrambled order or lucic unscrambled order is

35:44	fact well defined as every traverse order I will talk more about that before

35:53	done and I'll probably I'll stop at point here and but let me see

36:00	my next flight is first. Um , so let me just make some

36:09	on this. And this is particularly When one does distributed the fifties and

36:24	it is looking at these so called the factors for roots of unity.

36:32	in this particular celebration did in the butterfly stage here, the roots of

36:44	corresponds to is slicing of the upper of the unit circle into peace

36:56	We only need half of it because used this property that in this decimation

37:06	frequency algorithm that they looked at data in the input, that is half

37:13	the input factor away from the So those twiddle factors, the fact

37:20	replaced by a minus times the total on the upper half of the unit

37:28	. Now as we proceed to the stages than data was of the half

37:35	size and there were two of them each half size. Fft used the

37:42	uh that the federal factors of the half it doesn't know whether it's other

37:47	input data or what not. This knows about the size and since the

37:54	now is half of the input. in terms of the power of the

38:03	subjects set the twiddle factors. There's half as many slices or in terms

38:10	exponents on his twitter factors. It go 123 anymore, but it goes

38:17	0, 2, 4 and basically twice the value of the twitter

38:26	Exponent in the 1st stage. And tell them what you're saying is also

38:33	means the people of factors used in second stage, it's a subset and

38:39	every other people in fact produced in first stage and then it goes further

38:45	. So then it becomes around only two unless every other of the

38:50	stage. So, and in the it's just a very simple bath

38:59	Yeah, So one more comment and I'll read through my comments under the

39:06	factor setting. So, um what and fft applications, one does many

39:18	f t s of the same size that's so that's what we typically

39:28	You compute the trader factor once and you re use them for every Application

39:36	50s of the same sex. So an area where you included for the

39:44	that computation in your code, but not a good way of doing it

39:50	. So you want first to uh , compute them and spend a little

39:57	of memory or the memory to um them and then just use them every

40:07	of the fft and that's where the is important to know how these things

40:15	in the context of distributed FTS because typically different little factors are allocated in

40:24	nodes of the cluster and you want be able to make sure that they

40:29	in the right place for where they needed. All right. And this

40:36	pretty much what I read is. , but they did not say is

40:43	. Well kind of if one has computed the total factors and store them

40:49	an array Because of this part or property on the cool. It took

40:54	of 50 one can use the address to figure out what the exponents are

41:00	this is simply what um, is the bullet points here says that it's

41:08	step off the most significant bit and into the race um, when you

41:16	to retrieve the proper total factor, questions on that before I talk about

41:24	decimation in time. Okay. So is also This otherwise decimation in time

41:40	tend to be the one that shows most of the time in textbooks about

41:44	50th as well as often in some people develop efficient algorithms were used

41:52	some competition. So here is now hmm. Wherever one looks since that

42:02	even input it. So in this the input factor x is petition

42:11	into looking at the even components in first submission here And help some a

42:19	of -1 such woman peanut yet. sorry about that. And then the

42:24	components of the input factor. Um then, so this is just again

42:32	inner product but rewritten in for even odd components of the input factor and

42:40	the same game. Uh Bye have . Uh Okay prime them being two

42:47	promptly even so I can take these and move it down to uh from

42:55	know from the top to making a . Over two instead. Someone now

42:59	kind of half size in a product also use some fiddle factors that are

43:05	half size half Ephesians. All So this is Oh just but now

43:09	-1 makes sense. Hopefully mm. And now one looks instead of 1st

43:18	2nd half of the output vector, capital. So under the decimation of

43:26	they love that input their points halfway and not even put values. Now

43:38	at what even input values and um values separated about half of the number

43:46	outputs families. Otherwise it's kind of in the manipulation. It's very

43:54	So if one does it this way me kind of derived the application of

44:03	divide and conquer scheme. Starting with output and now we used half

44:12	Fft is based on even or odd values but it's still kind of a

44:18	competition. The butterfly looks a little different in the sense that the complex

44:25	happens before the actual but if I on the decimation in frequency that happened

44:31	the output, one of the outputs the butterfly. So it's not particularly

44:37	but it's very similar and then one again continue to do the same exercise

44:43	and then man get something like So now, you know, we

44:50	with basic derivation from the output that had a normal order and then figure

44:58	this obvious split that happened in every step. It in fact resulted in

45:05	the kind of our input yes, to be in a bit traversed order

45:16	order for the output to be in order. There is a a few

45:28	things to comment on and I don't exactly my sights here but to be

45:35	here is now for this decimation in derivation all the twiddle factors spent along

45:45	After a 1/2 circle is used for last butterfly stage instead of the

45:54	So the kind of order in which title of factors are used is reversed

46:01	to the decimation in frequency. They're and that's kind of one important fact

46:12	in particular in the parallel computing or fFT competitions otherwise. So I think

46:26	so this is just script mentioned that and here's kind of but I also

46:32	talked about and harmon then use the address or in stage number to figure

46:42	uh huh how to index into a of physical factors. Let's see what

46:51	next time is. I guess it's the two that's already done in talking

46:58	them. Um So um the way derived it for the decimation in frequency

47:12	they input was a normal order and the output being a bit reversed

47:20	Um The things were totally opposite for decimation in time And I also commented

47:27	Howard 20 factors are being used now the fingers, right? The I

47:44	to talk about the last comment on slide, I think on the next

47:51	. So this basically showed that the on the left hand signers to instances

48:05	the destination of frequency on this there is the left side that is

48:09	what was previously in the derivation of Decimation of frequency at 15 but it's

48:17	fine to kind of commute the lions you like in the but if I

48:29	so then what happens if you were do that? Which is essentially to

48:33	well instead of having the import vector to memory in normal order, if

48:39	assign the values to memory location and d traversing the index of the input

48:48	, then you would get the output normal order. So, so the

48:55	of this is to say, well is what is often not pointed

49:02	is that one could as values the in frequency, as a decimation in

49:11	. If the data presented as its in vitro first order and the

49:21	stand up also for the decimation in even get the output in normal.

49:32	another point that I wanted to make I haven't been before. So whether

49:39	looks at the left slide, the butterflies combines input values that are half

49:49	apart. X zero and X. . Is part of one. Butterfly

49:54	and 9 is the input to another . And that obviously doesn't change in

50:00	of this, the allocation of memory the competition needs to be the

50:06	So in the first stage and the and fire and frequency one combines in

50:16	buffet computation input values. There are the input director apart from each

50:27	Now the fun. Alright. Uh I serve now if you look at

50:32	decimation in time left his side is what Peter ended up within driving the

50:40	in time. 15 and the same . Yes. To the traversal of

50:47	assignment values, the memory location and get the right hand picture that shows

50:53	. So if you have there in order has important Eucharist values the destination

50:59	tiredness. The decimation in frequency. the consequences that now the output to

51:04	traverse for So they're both are perfectly to use regardless whether the input orders

51:14	and be traversed. The other point again like I pointed out for the

51:22	in frequency on the previous slide. even for the decimation in time.

51:27	the first stage you combine input There are half of the infant sector

51:36	. So the set of data values are combined as you go through the

51:41	of the FFT follows the same Whether you use decimation in time or

51:46	in frequency. That's another part that important to understand. And so the

51:57	reference pattern does not change whether you decimation in time, the decimation in

52:03	, it depends if the input order normal or be traversed but the combination

52:13	input values that is used in the of our stages. This is the

52:19	. Yeah the butterflies are slightly So how the trailer factors are used

52:25	different but it's the same fiddle frankness just the order in which their use

52:32	different. Yeah, so that um just to comment on this and then

52:43	will stop so just to be kind obvious um just illustrating what the controversial

52:53	in terms of having a binary representational um and it just best there so

53:03	look at the binary digits in the cold column and you look at in

53:11	bitter for asbestos, if you write from left to right then it was

53:16	of write them down in the opposite when you do to be controversial.

53:19	nothing more to it than that. it's very well defined what it

53:23	Um It's yes uh kind of do reflection on the bits around the need

53:34	So now comment on this and then was gone again and see if there

53:40	questions. So the victory reversal is that has received a lot of interest

53:54	again, interferes in one of the common algorithms out there and sometimes people

54:02	to have the same input and output . That means we need to add

54:06	stage that restores the order that was in the input and that is the

54:15	. Now people also want to not have temporary race to have So they

54:22	to do it in place like they to hear 50 itself and then it

54:27	somewhat tricky. So there are many um about how to do the traverse

54:37	in place to avoid a large amount temp storage and their papers there's also

54:46	book on for fast fourier transform but band on uh on a university that

54:53	a very good reference for both the fifties. The other point to be

55:00	of As a woman look how clusters Mandelson distributed their 15 then Yeah,

55:09	type of fermentation that the V traversal often is our process a lot of

55:19	in the network, if it it tends to be the case that

55:25	transposition does. So if anyone is in trying to figure out the capabilities

55:31	the interconnection network in a cluster uh to us you two tried the controversial

55:41	matrix transposition and find out what one may work well And the other

55:46	may not and it's also good if buy something, it's a good benchmark

55:52	or acceptance criteria. If you want test what the vendor promised in terms

55:57	their network and I will stop before on to the next. Okay,

56:14	. So mean in some cases that's yes to do say informant and

56:33	So like in the case of that's good a wrong application. I

56:43	about to say in terms of Yes To do sort of the forward

56:48	in order to find out how many coefficients you need to kind of or

56:54	enough to represent the image that sort compression that happens reduction from the number

57:00	pixels in the example I talked about the number of fourier coefficients but then

57:05	probably like to restore the image and you're going to do the inverse

57:12	Um in terms of the M I. That also mentioned, that's

57:21	application. It turns out data that get from the MRI scanner as it's

57:34	. The input data is actually in full year space. So they get

57:38	image you need to run an enormous transform and that may be all that

57:45	want because you may want to understand pictures of their organs that are trying

57:51	image and then then you may work the image space. Now in terms

57:59	the american methods, it turns out people often transforms, It's a geometric

58:08	or time into the for your domain then it turns out manipulations or algorithms

58:16	faster to use in the four year and then eventually I should have done

58:22	. You want to come back to space in which you started. So

58:29	quite common at least in terms of and engineering numerical methods that You use

58:39	forward and inverse f. 50s in . Yeah. And this slide basically

58:45	that if that's the case, one necessarily need to bother about trying to

58:53	order. So the traversal is built and the reversion implied in the

59:02	But as long as you know where data is, if you do a

59:06	followed by an inverse transform the result the end is that the output is

59:12	the same order as the and so why it's good to do if one

59:20	want to see if the coals or else. There is no need necessarily

59:27	um including the traversal stage, Uh when we're getting the index memory

59:39	we use a compliment context instead of the so I was a little bit

59:47	Some NBC if it gets clear. , the compliment. So let me

60:01	the next line maybe see if that's the question is or um so there

60:10	no beach worse unnecessary. So um come back to this. So if

60:16	do. Okay. Yeah. So Oh okay. So it was

60:29	think that you said the compliment. that what I heard? That's right

60:36	compliment. In what sense? I if you're taking 40 so if you

60:43	a complemented it's very like all ones it'll be like a mirror image but

60:48	a different restaurants. Um So so you mind repeating and see if it

61:01	through more clearly? Um What is is rather than the worst thing of

61:08	? Uh And you simply just flip bits Make a 01. Does that

61:17	not change any? Does that change alphabet? Um Okay I'll see if

61:27	following answers the question so mhm. point on this slide is essentially you

61:34	need a bit reversal if you do both. Now this line was you

61:38	the input in normal order and that eventually you get Results of the two

61:43	combination. Also in normal order. you have the bit traverse input order

61:48	works as well find them the middle the normal order. And if you

61:52	the uh members of fatigue to that you get back to Big river

61:58	That's that was the question. Right we now I don't think so.

62:19	question I think if I understood correctly rather than doing the reversal, one

62:27	the processes that showed up um if the reverse and so on. Okay

62:37	sorry the reversal of the data No it's simple. Let's simply just

62:50	the bits rather than reversing the order make a 01 and change 1-0 rather

62:57	reversing the order. Uh investor. . But even so doing in the

63:10	part to bring in a lot of for different limits. So you can

63:18	manipulate in the seas and whatever way want. But you cannot change the

63:27	items. That needs to be combined each stage. So it's important.

63:35	possibly index of the values that are by the numbers or that's just the

63:46	are there to illustrate a binary encoding the integers and the data then lives

63:58	memory locations. And there's a question retrieving getting the right address is to

64:06	the proper data. That should be in the butterflies. And if you

64:12	it in place you have no choice the output then corresponds to in

64:20	C. So after you've gone through the records of stages, in order

64:25	understand which frequency component in this case capital X. Is in that memory

64:34	. You need to basically do a reversal on the address in which it

64:42	, then you will find which frequency is in that memory location.

65:02	I'm gonna be after this. So you don't do it in place,

65:06	means you use temporary storage. So each butterflies state you choose to write

65:13	else in memory. Then you can what's known as self sorting and

65:27	So that means inherently do. Additionally in Prima Titian's into the stages in

65:34	FFT and the self sorting in place tricky but they can be done but

65:40	you built in the equivalent of the reversal into the various stages On the

65:48	15. So it's important to think terms of a data items that are

66:01	combined and where the lives in memory figure out not to address memory to

66:07	the right values into the right panel . And this slide just assumes that

66:14	have an array in memory in which is input data and in this case

66:20	take the normal and the bit reverse it if you have other ordering of

66:25	input data you need to make sure your address is such that you're all

66:31	access input data. That in terms the input data ordering is halfway through

66:40	sequence of part, depending upon the of the sequence. And then you

66:46	a different map to memory if you use normal or be Trevor story

66:53	So important to distinguish uh the combination input data and needs to take place

67:04	where they live and in this case really simple for having them in this

67:10	the input data. Okay. his addresses and memories from 0 to Oh

67:21	that help the question bomba? somewhat we can talk tomorrow one.

67:32	was saying that what the question was here you reverse the bits of the

67:41	or find the index in extras. um I'm a little bit uh hesitant

67:52	say that I reversed the bits because order to get understand what component is

67:59	bit in a particular memory location, need to reverse the bit of the

68:09	. President, the compliment devastates if did a compliment then the complement of

68:16	bits is not the same. You if you complement the address, did

68:23	get a different number than if you traversed the number if that's the

68:31	Yeah that's kind of what? Right If you take the 1st 001 and

68:41	a bit complement of that, you 1110 which is not type and that

68:50	the question. Okay, shall I , yep okay sorry Yeah and I'm

69:04	to answer when I'm back from the is easy but so this was the

69:09	that often do the inverse one doesn't to do the bit reversal as long

69:15	you know where the data lives because need to apply the right set of

69:19	factors that the combination would do Okay, let's see what happens

69:25	Mhm um Yeah so this is just different. So we found that um

69:39	Embers 50 renters just the members of total factor. So that's fairly simple

69:48	now let's see decimation in time has same property. So again, I

69:52	the example here, normal order input then things are d traversed and then

69:58	chris towards the order. So that's same. Um Now this slide is

70:10	trying to point something else out and is that if you and this is

70:19	particular for distributed Fft where this light important. So if you think of

70:31	Partitioning the input vectors that put four In each of four notes and you

70:38	the first zero for three values, a one note than the 4-70s and

70:42	different note. So you kind of uh lines across here. Then if

70:52	then look at the twiddle factor, values of things within the little square

71:00	. As you go through a line the forward to the members, you

71:06	see that effectively the same twiddle factors so being used in the same mold

71:15	the forward and the inverse. So fiddle factors are now in the right

71:20	, but it only applies if you the combination of the summation in time

71:23	decimation in frequency. Otherwise it does . So that's part of the reason

71:30	I said early on that for distributed 50s, it's essential to know about

71:37	forms of decimation in time with this frequency in order not to have to

71:43	with the factors or also communicate twitter or re compute that. So this

71:51	the most efficient type of combination and doesn't matter which one is done

71:56	Uh decimation in time and decimation in , but one of them should be

72:02	in time and one should be decimation frequency. Okay, let's see what

72:11	asked. Try to get to before time is up. So this is

72:18	just saying what I already said. now I wanted to document this once

72:24	some of you, What do you ? A 15 projects? And since

72:30	course is about making efficient use of , memory, hierarchies are particular significance

72:40	this point. It's important for So I had a few minutes and

72:46	love Okay pointed out. So now is this higher ratings topic that comes

72:55	mentioned readies for an eight early on terms of Cooley Tukey fft and the

73:02	to use the properties of the title factors that go halfway around the the

73:07	circle. It is um The power metal becomes -1. So basically the

73:17	any or meter in the upper Yeah. And you can go halfway

73:24	away from that and you have the Value except it's multiplied by -1.

73:30	that means we only needed half of trail of factors and can replace the

73:36	ones but taking the upper half And deployed on mid 90 swan. There's

73:42	one that is very convenient and that actually go a quarter way around the

73:47	circle um january reached the imaginary one I. So that's the particular

73:57	It doesn't require a and they're real . Multiply with high this change is

74:03	real and imaginary components. So that's basis for the Israelis for tight fifties

74:15	I have about five more minutes to tell you this is a bunch of

74:21	equation that looks messy and probably impossible comprehend by staring at in a

74:28	But you can see there are four that is basically going around in this

74:33	taking the input factor, young dis frequency instead of doing an index and

74:40	halfway the vector length you go a way the vector length people before and

74:46	two people before and flip you before then you do the same thing with

74:52	to the output and not just look than even but there's still a quarter

74:59	ah separation um Sorry, four sets basically four instead of two. Or

75:12	now you cover the range to the Multiple of four by adding 1,

75:17	and three. So if you do well just quickly give you the point

75:25	and in order to give you what the point of all of this

75:29	So you get sort of now ratings . But the fine looks like this

75:35	as you can see in this case is effectively four and radios to butterflies

75:43	in this rating for butterflies. Um this ratings for there is a multiplication

75:51	-1. That is just simply flipping and imaginary components with the proper

75:57	And so now you have At the three complex multiplication with privilege factors.

76:08	we had one complex multiplication from each of the four multiply. So for

76:16	so now we basically have instead of complex multiplication in the ratings too configuration

76:22	we have three instead of four. we say one complex multiplication And working

76:29	Great X four but that's not actually most important part. I will skip

76:35	, you can look at it. and this is this mission time,

76:41	can do the same thing. Um then you can even go to reading

76:47	but when you go a quarter way uh the unit circle. Yeah,

76:52	you need some numerical or uh arithmetic you end up with having a square

76:58	win there. So it's not quite cheap anymore. Um it does have

77:04	real foreign point operation but here's pictures it and I'm trying to get here

77:11	to the french sign that is important try to optimize performance and um so

77:20	the upper kind of table, refers to each one of the Butterfly

77:28	, Israelis to foreign eight. How ads? Some of subtractions multiplication

77:33	And how many storage references is and point is then that the idea is

77:39	each one of these butterfly competitions fits fast storage so you just retrieve input

77:48	output data for the butterflies. You have to do any um right thing

77:59	loading from memory to carry out uh really ate a butterfly for example.

78:05	all loaded in what? So if do that one can see that in

78:11	of storage references, if you look the right thing, column going from

78:17	to turns ratings. eight Reduces the access by the more than a factor

78:24	two. It also says a little in terms of the number of operations

78:31	in that case it goes from five roughly four. So it's about 20%

78:37	and arithmetic but it's more than a to the memory references. And since

78:43	references is usually the bottleneck and most systems today, That's why during this

78:52	ratings of 50 are critical and most use them and they the extension is

79:00	won best. It does a red . That is as large as you

79:06	do. That fits in the cash , level one cars and then level

79:11	cars and the three cash. Um means that the look for as you

79:17	from level one to level two, ratings is larger uh etcetera. And

79:23	for the same concept as for the matrix multiplication I talked about last time

79:31	now I'm sure that my time is . So I take questions and I

79:38	stop here and they have been through in detail next time the rest certain

79:51	that I have today, you can some print the picture and flip them

79:58	very quickly here. So sorry it's there as well, you can see

80:03	happens. So in this case the axis is performance in some sense and

80:09	two horizontal access is memory strikes for and output. As you can see

80:16	the strikes gets larger, then the drops and high jobs depends on the

80:23	cache policies for the particular processor that have. But keeping things small in

80:32	definitely has a significant impact on the and I think there was another sign

80:39	that shows the range, the blue , it's kind of bad performance in

80:45	of being put them up, put and kind of brownish reddish bars and

80:51	all that is when you have really strikes and managed to make good.

80:55	first use of cashes and the green are coming on the average for this

81:02	but so it's a huge difference but are slicing, it can look through

81:07	and you're gonna get an idea what is good, good if no

81:17	I will stop, you have to sharing. I was stopped recording.

81:23	huh uh huh I don't know if lost to Russia where along the way

81:31	I think uh stop

Previous Next

00 : 01
03 : 47
06 : 29
09 : 27
12 : 53
18 : 41
22 : 29
33 : 11
37 : 41
41 : 21
46 : 55
52 : 41
56 : 21
69 : 29
74 : 19