Since I don't have enough reputation to add comments, I leave my comment here. I think you are right. For more details, you can see slides from Lecture 2 and Lecture 11, in the course named "TinyML". The link is here. The peak memory of models using depth-wise convolution is larger than that of other normal models.