CS 6674
Last Updated
- Schedule of Classes - November 20, 2025 7:07PM EST
Classes
CS 6674
Course Description
Course information provided by the 2025-2026 Catalog.
Multimodal representations are reshaping computer vision, driving advances in both understanding and generation across a wide range of perceptual tasks. This research-oriented course explores computer vision techniques that integrate images with additional modalities such as language and 3D geometry for addressing challenges in both analysis and synthesis tasks. Possible topics include visual grounding, multimodal alignment, and text-guided generation and editing over multiple 2D and 3D representations.
Enrollment Priority Enrollment limited to: Cornell Tech PhD Students. Recommended prerequisites: CS 3780/CS 5780 or equivalent, CS 5670 or equivalent.
Last 4 Terms Offered (None)
Learning Outcomes
- Analyze state-of-the-art multimodal techniques and understand their architectures and capabilities.
- Identify open challenges in the field of multimodal computer vision.
- Design deep learning pipelines that combine multiple modalities for addressing a research-oriented problem in computer vision.
Share
Or send this URL:
