In human-robot collaboration, current widely used human-robot communication is mainly based on audio and haptic mediums. However, this kind of communication is stiff and mechanical. Inspired by human-human communication in which vision and hearing contribute over 88% for human perception, we propose a knowledge-driven audio-visual virtual agent system, which allows collaborative robots to present its knowledge and feelings in a human-like way. During the collaboration training process, the virtual agent will build its assembly knowledge of how to work with the co-worker based on inverse reinforcement learning. To deploy a co-assembly task with its human partner, the virtual agent will also be able to produce assembly knowledge-based responses, which include knowledge-driven speech and speech synchronized facial animations. By leveraging the proposed knowledge-driven virtual agent, the collaborative robot not only can fulfill the co-assembly task but also can communicate with human partner in a more natural way.