In the creation of MultiscaleCNN, you want to divide the embeddings dim to 3 parts, but 4096 is not divisible by 3, instead each dimension of a subnetwork is cast to 4096//3 = 1365, then multiply by 3 which give out 1365 * 3 = 4095. For a quick fix, to inititalize DeepCNN, you can pass out_dim - (out_dim // 3) * 2 as the residual dimension.