Many statistical downscaling (SD) methods require observational inputs and expert knowledge, and thus cannot be generalized well across different regions. Convolutional Neural Networks (CNNs) are deep learning models that have generalization abilities for various applications. In this research, we modify UNet, a semantic-segmentation CNN, and apply it to the downscaling of daily maximum/minimum 2-m temperature (TMAX/TMIN) over the western continental US from 0.25-degree to 4-km grid spacings. We select high resolution (HR) elevation, low resolution (LR) elevation and LR TMAX/TMIN as inputs, train UNet using Parameter-Elevation Regressions on Independent Slopes Model (PRISM) data over the south- and central-western US from 2015 to 2018, and test it independently over both the training domains and the northwestern US from 2018 to 2019. We found the original UNet cannot generate enough fine-grained spatial details when transferred to the new northwestern US domain. In response, we modified the original UNet by assigning an extra HR elevation output branch/loss function and training the modified UNet to reproduce both the supervised HR TMAX/TMIN and the unsupervised HR elevation. This improvement is named “UNet-AE”. UNet-AE supports semi-supervised model fine-tuning for unseen domains and showed better grid-point-level performance with more than 10% mean absolute error (MAE) reduction compared to the original UNet. Based on its performance relative to the 4-km PRISM, UNet-AE is a good option to provide generalizable downscaling for regions that are under-represented by observations.