Claude Shannon invented information theory in 1948 to study the fundamental limits of communication. The theory not only establishes the baseline to judge all communication schemes but inspires the design of ones that are simultaneously information optimal and computationally efficient. In this talk, we discuss how this point of view can be applied on the problems of de novo DNA and RNA assembly from high-throughput sequencing data. We establish information limits for these problems, and show how efficient assembly algorithms can be designed to perform close to these information limits, despite the fact that combinatorial optimization formulations of these problems are NP-hard. We discuss Shannon and HINGE, two de novo assembly software designed based on such principles and compare their performance against state-of-the-art assemblers on several high-throughput sequencing data-sets.
Biography:
David Tse received the B.A.Sc. degree in systems design engineering from University of Waterloo in 1989, and the M.S. and Ph.D. degrees in electrical engineering from Massachusetts Institute of Technology in 1991 and 1994 respectively. From 1994 to 1995, he was a postdoctoral member of technical staff at A.T. & T. Bell Laboratories. From 1995 to 2014, he was on the faculty of the University of California at Berkeley. He is currently a professor at Stanford University.