Thermal cameras enable non-contact estimation of the respiratory rate (RR). Accurate estimation of RR is highly dependent on the reliable detection of the region of interest (ROI), especially when using cameras with low pixel resolution. We present a novel approach for the automatic detection of the human nose ROI, based on facial landmark detection from an RGB camera that is fused with the thermal image after tracking. We evaluated the detection rate and spatial accuracy of the novel algorithm on recordings obtained from 16 subjects under challenging detection scenarios. Results show a high detection rate (median: 100%, 5th-95th percentile: 92%- 100%) and very good spatial accuracy with an average root mean square error of 2 pixels in the detected ROI center when compared to manual labeling. Therefore, the implementation of a multispectral camera fusion algorithm is a valid strategy to improve the reliability of non-contact RR estimation with nearable devices featuring thermal cameras.