00:02 | So welcome to my talk on data systems. So my name is |
|
00:09 | So the research in my group involves large data sets, large files, |
|
00:15 | documents, tables, et cetera that continuously growing right and become very |
|
00:24 | Right? And that's what we call days. Big data which stands for |
|
00:31 | velocity and variety. So how do do it? We use fast external |
|
00:37 | . We use efficient structures that we that work at two storage levels prime |
|
00:43 | disk. In general. We do in parallel. We develop parallel algorithms |
|
00:48 | can work on a multi node cluster distributed storage as major goals. We |
|
00:58 | algorithms that have linear time complexity and linear speed up uh some analytics we |
|
01:06 | in the group include machine learning graphs are exploration that I will explain in |
|
01:12 | little bit more detail. So in , my approach to conducting research is |
|
01:19 | to apply CS theory to develop good . So data science involves combining theory |
|
01:29 | programming again from a theory perspective we time complexity of algorithms. We perform |
|
01:36 | o analysis as well as I a analysis. We extend and study many |
|
01:44 | algorithms in computer science. Uh We mathematical tools from linear algebra and as |
|
01:54 | analytic applications. We have machine learning and graphs. Uh The programming uh |
|
02:01 | done mainly in C plus plus C python. But they are combined with |
|
02:08 | written in our SQL scala and javascript on the application. Most of my |
|
02:16 | is conducted on UNIX machines. That's we develop the code. Right? |
|
02:20 | we also uh do some development and on windows from the system side we |
|
02:28 | multithreaded programming. We perform my own and binary files. We exploit parallel |
|
02:34 | systems. We are careful about memory using main memory in a wise |
|
02:40 | We generate code, we optimize et cetera. So we have a |
|
02:45 | of fun. This is an example problems we solve in my group, |
|
02:53 | upper left we have a cube in we explore uh multidimensional dataset with |
|
03:02 | Trying to find important trends on the right. We show an interesting summarization |
|
03:10 | works for many machine learning models including regression, basically classification principle, component |
|
03:19 | analysis and K means clustering on the left. We have some of my |
|
03:26 | advanced research in machine learning Multivariate statistical where we show a sample of several |
|
03:33 | models that today represent. One of best approach is competing with neural networks |
|
03:40 | develop predictive models. On the lower hand we have graphs, right. |
|
03:46 | also work a lot on graphs and those problems include reach ability measuring |
|
03:54 | detecting clicks, right? And the is showing a rich ability problem in |
|
03:59 | graph with nine versus is going from to 8. So why should you |
|
04:05 | my group? The research we conduct presents a balance between theory and |
|
04:13 | We are proud to say that we how much learning algorithm works by step |
|
04:19 | step instead of just calling them. , we re learn classical algorithms that |
|
04:25 | saw previously in computer science, but see them working on truly large |
|
04:30 | Uh, We build open source open source tools, right? And |
|
04:35 | will be part of that. Our , our programs have many applications going |
|
04:40 | data science, big data to databases even images. From a job |
|
04:46 | I mean, the outlook is right? And any company develop developing |
|
04:52 | software may be interested in you when finish your peers day. |
|