Using Deep learning for Multi-modal and Graph-based Inputs — Ray Ptucha

Deep learning has enabled incredible advances in computer vision, natural language processing, and general pattern understanding. Success in this space spans many domains including object detection, speech recognition, natural language processing, and action/scene interpretation. For targeted tasks, results are on par with and often surpass the abilities of humans. This talk addresses two limitations of current deep learning research: 1) connecting multi-modal inputs such as vision and language into a common latent representation expose weaknesses in vector representation; and 2) creating CNNs for graphs is problematic as neither FIR filtering or common pooling techniques are applicable. This talk addresses these limitations and highlights recent discoveries and research done at RIT.


Raymond Ptucha is an Assistant Professor in Computer Engineering and Director of the Machine Intelligence Laboratory at Rochester Institute of Technology. His research specializes in machine learning, computer vision, and robotics. Ray was a research scientist with Eastman Kodak Company where he worked on computational imaging algorithms and was awarded 31 U.S. patents with another 19 applications on file. He graduated from SUNY/Buffalo with a B.S. in Computer Science and a B.S. in Electrical Engineering. He earned a M.S. in Image Science from RIT. He earned a Ph.D. in Computer Science from RIT in 2013. Ray was awarded an NSF Graduate Research Fellowship in 2010 and his Ph.D. research earned the 2014 Best RIT Doctoral Dissertation Award. Ray is a passionate supporter of STEM education and is an active member of his local IEEE chapter and FIRST robotics organizations.