Checking only odd numbers is basically skip over a prime "2". You can skip over 2/3/5/7 quite economically, which will save many divisions. The skip block size is 210 (=2x3x5x7), after which you repeat the skip pattern for the next block.
The max prime to test for an arbitrary number N is int(sqrt(N)). Furthermore, you only need to attempt division by primes in that range, but that requires memorizing all previously found primes (the number depends on your max number limit), which can exhaust the storage requirements. The least you can do is only divide by odd numbers in the range.
Last trick is to avoid computing sqrt(N) for each candidate by computing the P squared on the next prime after you passed the previous (P-1) squared. If you combine all the above, then you can further optimize this flow by computing the P squared for the skip blocks. For example, skipping 2/3/5/7 has a repeatable pattern in blocks of 210. For large N (N>211^2), the distance between (P-1)^2 to P^2 will span multiple skip blocks. It's cheaper in computer resources to count the blocks before advancing to the next P squared.
If you have a choice of compute cores to run your program, choose the one that offers early termination of the divide algorithm. Obviously a VLIW core is a big plus.
Have fun!