Hi, I know this has been discussed before, I have read through some of the archives and read about some of the rationale. I want to raise it again however, because I don't think anyone has ever presented a good example of where it is really really useful on x86 architectures.
In general, it is very useful for selecting different versions of instructions (byte, word, dword, qword) with a template specialization. I'll post some code that works under visual c++ 9.0 to demonstrate what I mean. The following function finds the index of the first zero (or nonzero with similar template specializations replacing rep with repne) "element" of an arbitrarily sized array (and is the fastest way I know to do so). template<typename T> int __declspec(naked) scas(); template<> int __declspec(naked) scas<boost::uint8_t>() { __asm rep scasb __asm mov eax, edi __asm ret } template<> int __declspec(naked) scas<boost::uint16_t>() { __asm rep scasw __asm mov eax, edi __asm ret } template<> int __declspec(naked) scas<boost::uint32_t>() { __asm rep scasd __asm mov eax, edi __asm ret } #if (sizeof(void*) == sizeof(boost::uint64_t)) template<> int __declspec(naked) scas<boost::uint64_t>() { __asm rep scasq __asm mov rax, rdi __asm ret } #endif template<typename T> int find_first_nonzero_scas(T* x, int cnt) { int result = 0; __asm { xor eax, eax mov edi, x mov ecx, cnt } result = scas<T>(); result -= reinterpret_cast<int>(x); result /= sizeof(T); return --result; } This is one example, but it illustrates a general concept that I think is really useful and I personally have used numerous times for lots of other instructions than SCAS. If there is a way to achieve this without using a naked function then please advise. I'd rather not resort to an if/then/else when the value of every test is known at compile time.