With the ever-growing reliance on Artificial Intelligence (AI) across diverse domains, there is an increasing concern surrounding the possibility of biases and unfairness inherent in AI systems. Fairness problems in automatic interview assessment systems, especially video-based automated interview assessments, have less been addressed despite their prevalence in the recruiting field. In this paper, we propose a method that resolves fairness problems in an automated interview assessment system that uses multimodal data as input. This is mainly done by minimizing the Wasserstein distance between two sensitive groups by introducing a regularization term. Subsequently, we employ a hyperparameter that can control the trade-off between fairness and accuracy. To test our method in various data settings, we suggest a preprocessing method that can manually adjust the underlying degree of unfairness in the training data. Experimental results show that our method presents state-of-the-art results in terms of fairness compared to previous methods.