The skill of surface temperature forecasts up to 4 weeks ahead is examined for weekly tercile category probabilities constructed using extended logistic regression (ELR) applied to three ensemble prediction systems (EPSs) from the Subseasonal-to-Seasonal (S2S) project (ECMWF, NCEP, and CMA), which are verified over the common period 1999–2010 and averaged with equal weighting to form a multimodel ensemble (MME). Over North America, the resulting forecasts are characterized by good reliability and varying degrees of sharpness. Skill decreases after two weeks and from winter to summer. Multimodel ensembling damps negative skill that is present in individual forecast systems, but overall, does not lead to substantial skill improvement compared to the best (ECMWF) model. Spatial pattern correction is implemented by projecting the ensemble mean temperatures neighboring each grid point onto Laplacian eigenfunctions, and then using those amplitudes as new predictors in the ELR. Forecasts and skill improve beyond week 2, when the ELR model is trained on spatially averaged temperature (i.e., the amplitude of the first Laplacian eigenfunction) rather than the gridpoint ensemble mean, but not at shorter leads. Forecasts are degraded when adding more Laplacian eigenfunctions that encode additional spatial details as predictors, likely due to the short reforecast sample size. Forecast skill variations with ENSO are limited, but MJO relationships are more pronounced, with the highest skill during MJO phase 3 up to week 3, coinciding with enhanced forecast probabilities of above-normal temperatures in winter.