Face-to-face human communication is a multimodal and incremental process. An intelligent robot that operates in close relation with humans should have the ability to communicate with its human colleagues in such manner. The process of understanding and responding to multimodal inputs has been an interesting field of research and resulted in advancements in areas such as syntactic and semantic analysis, modality fusion and dialogue management. Some approaches in syntactic and semantic analysis take incremental nature of human interaction into account. Our goal is to unify syntactic/semantic analysis, modality fusion and dialogue management processes into an incremental multimodal interaction manager. We believe that this approach will lead to a more robust system which can perform faster than today's systems.