Name: the x86 instruction proprietary extensions: a waste of time, money and energy
Item: the x86 instruction proprietary extensions: a waste of time, money and energy
Author: Johan De Gelas

the x86 instruction proprietary extensions: a waste of time, money and energy

by Johan De Gelas on 12/6/2009 12:00 AM EST

Posted in
IT Computing general

Post Your Comment
Please log in or sign up to comment.

Comments Locked

108 Comments

Back to Article

wavebossa - Friday, January 22, 2010 - link
I realize that you guys are talking about cleaning up the x86 and fullly going 64bit, but lets not get carried away and lets actually focus on the article at hand..

So let me get this straight, at this point... we are are only wasting 4-8% ram utilization...?

Why not just, oh I don't know, buy more ram for now? I mean come on, we are not ready to get into to full 64bit, ppl still use windows 98, lol.

Am I missing something? please tell me if I am.
bcronce - Sunday, January 10, 2010 - link
"As for your point about Microsoft choosing not to enable it in consumer based OS's, look also at the limitations in place on the commercial OS's not all of those support "36 bit PAE" (all NT based os's support PAE, it's a question of whether or not they use the 36 bits instead of just 32). it's an artificial selling point on the part of Microsoft, there is no reason to limit consumer 32 bit OS's to 4 gigs of ram except to place a "premium" on the ability to use more than that."

If you ever read MSDN, most consumer level drivers didn't correctly support PEA and drivers that don't handle PEA don't play well with 4GB+ memory. Random errors that were hard to track down or repeat. Essentially your drivers didn't recognize the extra 4bit and when Windows said "Hey, I got free memory at 123456 on page 2" the driver goes, "YAY!! free memory at 123456" and would overwrite data on a wrong page. Being drivers are kernel level, the OS couldn't say "no" and you'd get random corruption/blue screens/weirdness.

"Than my question would be why branch prediction,speculation on today AMD adn Intel cpu-s takes so much space from the core die."

Branch prediction takes about 1/15th of the die space of each core on and i7 and is ~94%+-1% accurate. A quick google brought back a few science journals saying about 10-12% increase in average speed. So, for about 7% of the core die area, you increase the core speed by ~10-12%. Sounds like a decent trade-off... For now. I could see just dropping branch prediction when we start to get a lot more cores and adding more cores with the saved space.

RISC vs CISC
This was more of a debate a decade ago. Now days, modern CPUs have 200+ internal registers that are managed by the CPU so you only see the standard x86 registers. Exposing extra registers helps up to a point, but in a modern desktop computer that's juggling many applications, it's better to let the CPU manage it's own registers to help context switches. RISC can be good for low power situations though.

Modern decoders could use a slimming by removing old deprecated instructions. I would be interested on how much die space could be saved by slimming down a decoder unit to work with all the modern versions of old instructions. I know the i7 has a quite complex decoder. It can combine multiple x86 instructions into single internal instructions. When the i7 detects a loop and if the loop can fit all of its converted micro-code into the instruction cache, it will shut down the decoder to save power.
HollyDOL - Wednesday, December 9, 2009 - link
And I say we should all stop using gasoline based vehicles right now.

It's the same thing... While there are some more or less working alternatives you would effectively kill whole traffic.

Same with computers... If you decide to throw away x86 (and it's 1234 extensions) or decide to reduce the set in favor of trashing outdated instructions, you are no longer backward compatibile and you are causing huge havoc... Will my software run on this reduced set or that reduced set, what do I need to make it work?

Even though the x86+extensions gets more and more bloated over time you have to keep in mind backward compatibility. And emulators? I still have to see one that works 100%.

Extensions being useless? Another point that I personaly consider false. My computer runs SETI@Home application... and the same task is done almost twice as fast using optimized application compared to normal one. Of my own software I have one cross comparing huge sets of data. Using 32bit ISA I can compare 32x32 records at one step... running the same software 64bit I compare 64x64 records at once... yay, just recompiling the app made it 4 times faster...

I don't like the ISA getting bigger and bigger but I understand there is a reason behind that. Going to extremes if not for ISA extensions we would still run floating point operations on software emulation.

my 2 cents
- Thursday, December 10, 2009 - link
The PC is a dinosaur and its X86 instruction set is a dodo bird. Relative to the Iphone, the PC is an underpowered overpriced shell of “could have been’s” . The Iphone gives the illusion of new age productivity- heck , I can flick my thumb to scroll- use two or three fingers to enlarge on two planes or three, I can watch movies, listen to music-and take it with me. Talk, internet, text, take and send pics…My computer uses a mouse, my productive programs use menus, as I click, click, click, to get things done- and oh the complexity of Word, Photoshop, C++. Rendering a photo still takes time at any cost –and who wants to spend $700 on a CPU for a 20% savings in productivity, on a Saturday afternoon. And ya know even if there were apps like the Iphone for a PC, it would take a a Quad Core CPUGPU to run and do what the IPhone does, thanks to Microsoft. It finally appears that Intel will no longer hinder the progress of the CPU’s, only now it finally melds with the coprocessor; a hybrid- fusion – the king is dead, long live the king

asH
RadnorHarkonnen - Friday, December 11, 2009 - link
I giggled like a little girl.

Have you tried to simulate a CISCO router on X86 CPU ? a 16mhz can bring the latests quad core to its knees. And running a GUI on cisco router ? Impossible.

Diferent chips do difernet things mate. And btw i prefer my APhone.
ProDigit - Wednesday, December 9, 2009 - link
Should Windows ever be written for an ARM processor, we'd see battery life gains, and probably also cost reduction.
If MS would put it's focus away from the x86 architecture, intel would be done for!
And AMD would rule, because they have knowledge of the other architecture thanks to their GPU building.

Then the world would be much easier and simpler!
Intel would probably start making chips that support other architectures, and probably get some optimizations on them;
But for some reason it's MS that decides what architecture leads!

With many more newer Linux distributions supporting other architectures, AMD and Intel could start making those chips too, but I think they know that those markets are rather small...

If MS came out with it's first Windows Mobile platform that would run on anything else than x86/64 we'd probably see a huge leap from one to the other.

Netbooks, MIDS, cellphones, all would be able to run Windows, and quite energy efficient!
I believe some netbooks have been tested, where a certain Linux would give 5-6 hours on x86 cpu's, while giving between 8 and 10 hours on ARM architecture.
The ARM processor was a bit slower than the Atom, but nevertheless, if we'd see 2 hours of battery life gain on netbooks (or about 30% gain), that it will be something to look forward to!

That's a big step forward!
Scali - Thursday, December 10, 2009 - link
I don't think MS decides.
Back in the days of Windows NT 4, Microsoft supported x86, MIPS, Alpha and PowerPC (and for this reason, Windows NT was designed from scratch to be highly portable... In fact, the 32-bit Windows NT executable format is even called Portable Executable).
You could run x86 binaries via a dynamic recompiler.

By the time Windows 2000 arrived, most of these architectures were no longer being used in servers or workstations, so they only supported x86 and Itanium (where Itanium again got a dynamic recompiler for x86 after it turned out that the hardware-emulation of x86 wasn't that efficient).
Later they added x86-64 support, and that's where we are today. Aside from Itanium, non-x86 systems are no longer supported, because there's nothing that really has any significant marketshare worth supporting.

Ofcourse the alternative Windows CE/Mobile works on ARM, and supports most of the Windows API. But until recently, ARM was not powerful enough to run a complete up-to-date desktop OS, so there was no point in doing a 'full port' of the regular x86 Windows. Perhaps that time will come once ARM-based netbooks become more popular.
It seems that MS would like to bring Windows 7 and Windows Mobile closer together anyway.
cbemerine - Wednesday, December 30, 2009 - link
"...But until recently, ARM was not powerful enough to run a complete up-to-date desktop OS, so there was no point in doing a 'full port' of the regular x86 Windows. Perhaps that time will come once ARM-based netbooks become more popular. ..."

I would suggest that it has more to do with lack of marketing then desktop OS. As I had a computer in the palm of my hand in 2005/2006 thanks to Linux, Maemo, OS 2008 and the Nokia Nxxx (N770/N800). If only Nokia would have marketed it more effectively. Of course Google will do a much better job advertising their new open Linux Google Android phone than Nokia did/does so perhaps that time is now.

If by saying the ARM is not powerful enough to run a complete up-to-date desktop OS you are referring to MS Windows specifically, then you are correct. The bloat, excessive memory requirements speak for themselves. I still lament the inability to reformat the hard drive, remove all bloat and run efficiently. With auto updates and auto upgrades the bloat is back after the first update and the poor user simply does not have any choice in the matter.

However Linux has been running very effectively for many years in the embedded device space, with low levels of RAM memory and slower processors. I doubt you would suggest that Linux is not a desktop OS? At least I hope not.

Now that the Nokia N900 (has cellular...I had everything else I needed with the Nokia N800.

With the Nokia N800 I had: GPS, H.264 high def video codec, webcam, 2 memory slots (I still have not filled up my two 4GB Micro SSD memory cards and I see both 16GB and 32GB for sub $20 via Amazon of all places...it was a special months ago), full web browser, mic/sound jacks and speakers; WiFi; Bluetooth; touchscreen w/ stylus, full size bluetooth keyboard, and most important of all root account access (so you can tweak/configure applications); FM Chip...

In fact there were over 450 apps available for the N800 and OS 2008 (Linux Maemo) last year, not counting the many Linux repositories where you could download and install apps on the device.

Now in addition to the Nokia N900, 1st quarter 2010 will bring the first unlocked (root accessible with blessings of company from the beginning) Google Android.

Shame the ability to install many applications on a computer this size since 2005/2006 has not been adequately advertised and marketed. Everyone I showed mine too, wanted one. How many people believe the Nokia N900 is the first, when the only thing it has, that the N900 (2006) does not is cellular, thats it.

If you prefer the iPhone, Windows Mobile or other vendor locked OS and hardware, well that is a choice you made.

The ARM is definitely powerful enough to run a desktop OS, but not every desktop OS. There are many versions of Linux, embedded or not, that will simply scream on that footprint (processor and memory). After all there are many Linux distros that will run just fine in 128MB of RAM, more is better, but they will do it!

Perhaps you have picked the wrong desktop OS!
Scali - Monday, January 4, 2010 - link
'2006' qualifies as 'recently' with me. Making the rest of your post void.
Penti - Saturday, December 12, 2009 - link
Back in those days Microsoft even helped designing non-x86 systems.

Windows NT was actually originally designed for Intel's RISC chip - Intel i860, but dropped before they completed. Or really OS/2 we might say.

Any way Intel not developing Itanium does of course effect things, IA-64 or Itanium is dead. New products aren't really coming. Compaq killed Alpha-support when they bought DEC. And so on. The world wasn't so uniform before. Now there's mainly ARM, x86 and some POWER and SPARC, MIPS is still there in the embedded space too. In the mid 90's there where MIPS, POWER, x86, SPARC, Alpha and PA-RISC all big. ARM was brewing then too. Of course the seizure of PA-RISC development also has stopped the old HP-UX market from evolving. It didn't pick up much steam on Itanium. Latest Itaniums are dual-core 1.66GHz 90nm processors and didn't get hw virtualization until 2006. So it's understandable. Of course you much rather run high-end x86 servers. Or even Power or SPARC for that matter. Microsoft has no real interest to continue support for Itanium either. They did release 2008 Server however but. They where mainly used for MS SQL Server any way. Ancient hardware makes it unappealing for even that. But of course did MS had a role there. But they decided to run with x86. By the time 64-bit computing became interesting for their market both AMD and Intel had come with x86 cpus supporting it. Only natural for the databases to move there. While Sun and IBM still has their UltraSPARC and POWER servers. They made more strategic choices so software was continued together with hardware. MS could have supported IA64 better and pushed it. So could Intel. It's not like we has to have the same cpus in are desktops as in our servers. But it's mainly Intel that has decided that. They did try RISC cpus, EPIC and so forth. But x86 is where they did succeed where they failed with the others. Itanium didn't really get any real server features so, there's one where maybe they didn't learn so much. It's Intel in the 90's that was able to compete and outdo MIPS and Alpha systems. I can think of Pentium Pro that was a huge performer, Alpha was preferred for some use in high-end for a while, but was killed by Compaq and the failure of DEC. Apple is the ones behind why Windows never took of on PowerPC in a way, no PReP compatible Power Macintosh was released, clones where killed quickly. But however that is all pretty moot because people developed for Windows to get away from all that need to support multiple platforms and architectures. In a way the success of x86 is that it isn't bound to a single vendor. That broad adoption couldn't really be achieved by any one else. Of course the home and lower end corporate market has a lot to do with it. DOS lived for a long time. Where games go it wasn't really till 97 we where starting to see Windows games. The market moved to consolidate the diverse industry it was. Apple show however that MS could have switched arch if they wanted too. But none such move where really made, then came the Pentium Pro as far as workstations and servers are concerned. Then P2, P3, etc. All the rest where ancient history by then when talking none vendor-specific systems. Apple did got the Power-desktop market, Sun moved away from their sparc workstations, the workstation-market as a whole disappeared and the only player in the desktop market was really x86. And it was really the only way to move away from the kind of multi-vendor market with vendor lock-ins that where prevalent. That applied to servers too.

Regarding binary emulation and such, it isn't until recently that could be made, therefore I think it was right that for example Apple didn't switch to x86 with OS X right away, they needed that backwards compatibility. For Classic environment. So backwards compatibility matters and it's only recently the software has turned up and the hardware is good enough to do it. Dropping x86 wouldn't been easy 15 years ago, and it's not easy today with even more legacy or baggage. But I don't think we need that any longer, as with the jump with Pentium Pro, we have really fast and advanced products on x86 today. Dumping x86 to develop x86 doesn't make sense. Hardware wise the legacy of the oldest stuff is already gone. I don't think it's a real problem for the decoders to handle that stuff. The older stuff can be emulated just fine though (see QEMU). But 16-bit BIOSes aren't completely gone yet even though they have stopped developing them. Peripherals are important too just to note. x86 has shown it can reinvent itself without resorting to new ISA. There the power lies. The legacy has just helped x86 cpus. MS was in a way trapped to x86 too, as they where expected to continue to support the machines sold with MS software. So it's not surprising to see them shine in a none-multi vendor and more unified software climate.
Scali - Monday, December 14, 2009 - link
Itanium isn't dead. Intel still has 'big' plans for it.
They want to skip a node and get Itanium on the same manufacturing process as x86, and they want to use the same sockets/chipsets/motherboards on both x86 and Itanium servers.
I think Itanium may temporarily have gotten less attention, when Intel needed to turn the Pentium 4 into the current Core2 and Core i7 success.

Binary emulation was applied even in the early Alpha versions of NT4, to run x86 software, as I said. So it has been possible for more than a decade. The main problem back then was that Alpha was aimed at corporate servers and workstations (much like Itanium), so it wasn't affordable to regular users.
But Alpha/Itanium could have trickled down into the consumer market eventually.
Penti - Monday, December 14, 2009 - link
You know by who? Digital that's right not microsoft, and it's Intel who supplies it for Itanium too. Microsoft didn't them self support many of the platforms they release NT on. That was my point. That's also why it didn't catch on. MS didn't put in any real effort to support them. They wanted to skip nodes? No 65 or 45nm has been released. That's right but the development has still stagnated both software and hardware wise, it didn't became the pa-risc replacement for HP, it didn't became the true mid/high end server replacement for Intel. So it's pretty dead, there's no reason to run the Windows Itanium version today. Tukwila will still be a 65 nm chip though, it's improvement to the memory controller that has taken time, that chip will be released next year, later they will jump to 32nm but that's not relevant in the observation that it's been years since a product release. As regards to binary emulation/translation it wouldn't been the right time for Apple to do it in 2001, that's all I said. It's much more mature today anyway. Any how by the time 32nm Itaniums arrive we will have Sandy bridge coming out. Any how you where mistaken about the current situation. The 65nm Tukwila delay has nothing to with what you mentioned. Also FX!32 hardly lived two years. Then MS dropped the Alpha port. Together with HP dropping Alpha support. Intels & HPs Itanium didn't replace both Alpha, PA-RISC and x86 parts. x86 did again catch on, development stagnated for Itanium as said so it could probably be regarded as pretty dead. Theirs still reasons for buying UltraSPARC and POWER-machines, but I fail to see the benefits of Itanium today. That's because it did fail to replace PA-RISC for HP and did fail to become a high-end windows server platform. The number of Itanium systems that is sold are maybe 60 000 - 80 000 a year! Most aren't with Windows. HP stands for 95% of the sales.
Scali - Tuesday, December 15, 2009 - link
Get a grip, mate :)
Use some proper interpunction :)
Penti - Tuesday, December 15, 2009 - link
Irrelevant. But yeah it was drawn out to long anyhow. But your not responding to the point anyway.
Penti - Tuesday, December 15, 2009 - link
Well lets do it this way.

1. Digital made and released FX!32 binary-emulation+translation, Digital pretty much supported NT themselves.

2. Tukwila - 65nm part was delayed and it's has nothing to do with jumping a node. It's delayed because of the work involved in the memory controller and QPI architecture. Poulson will be 32nm and be released in 2011. Tukwila will be a 4 core 65nm chip. It will use same chipset as (Nehalem-EX) Xeon MP platform.

3. New products in the Itanium line aren't really coming out as expected.

4. x86 has cough up the the point where Itanium actually need to significantly catch up with x86 development. There's no evident RAS advantages. Currently it's aged bad.

5. Itanium failed the be a unified RISC/EPIC product. Windows on Itanium didn't catch on and HP-UX development stagnated to the point where most migrated. Most didn't move away from PA-RISC machines either and continued buying them till the end. Shops are replacing DB/MS SQL Server Itanium machines with x86 ones now.

6. Virtually only HP sells Itanium machines. Linux and HP-UX sells better then Windows Server.

x86 did catch on before, and did it again. It's where the development goes and it can reinvent itself fine regardless of the need for backward compatibility. Emulation won't provide full compatibility, requires much work and will be slow painfully so if just running in "emulation mode". FX!32 created a kind of native code from the first time you started an app. You can always do like Transmeta and create a special architecture for running ISA emulation on but theres no appearant benefit.

Finally operating system vendors have been poor at supporting or writing this kind of Emulation, I mentioned Digital and Intel here. The OS vendors didn't support it, Apple brought in Rosetta from outside the company that's why they couldn't have done something like it before. Needless to say Rosetta didn't provide a fully compatible environment for the applications. Microsoft didn't them self support the endeavors into other platforms therefor. And that contributed to the failure. MIPS was killed pretty early on. There where no binary translator there. Of course today you can always do full systems emulation like QEMU or even bochs. VPC worked kinda. But it's painful so most would rather skip it.
Scali - Wednesday, December 16, 2009 - link
I don't feel like covering all these points, as you seem to be looking for an argument more than that you are directly arguing what I said.

But I do want to point out that Apple's Rosetta was not the first, but the SECOND time that they used emulation to cover a transition in CPU ISA. The first time obviously being the move from 68k to PPC.
I don't think it's relevant whether the OS vendor supplies this solution, or if it's supplied by a third party, as long as it works.
ProDigit - Wednesday, December 9, 2009 - link
With above post I had hoped that it would have been possible to run Windows on ARM technology, while regular programs for windows continue to function just like they do today!

But I guess now, that THAT would be pretty impossible, unless you're running these programs in a virtual platform.

Sometimes I guess solutions are not as simple as we think they are!
rs1 - Tuesday, December 8, 2009 - link
I don't think having a standardized procedure for x86 instruction set extensions would improve upon any of the issues that Agner raises. For instance, he cites the following:

-- "The total number of x86 instructions is well above one thousand" (!!)

And if there were a standardized method for adding instructions, then there would likely be just as many, if not more. Having a standard procedure for adding instructions to the x86 instruction set doesn't mean that people are going to stop doing it.

-- "CPU dispatching ... makes the code bigger, and it is so costly in terms of development time and maintenance costs that it is almost never done in a way that adequately optimizes for all brands of CPUs."

You have to deal with this whether or not there is a standard procedure for extending the x86 instruction set. The only way to avoid it would be to either start working with something other than x86, or reduce the size of the existing x86 instruction set, and then disallow future additions.

-- "the decoding of instructions can be a serious bottleneck, and it becomes worse the more complicated the instruction codes are"

And again this issue still needs to be dealt with either way. Having a standard procedure for adding a new instruction to the ISA doesn't mean that the instruction being added is going to be any less complex to decode.

-- The costs of supporting obsolete instructions is not negligible. You need large execution units to support a large number of instructions. This means more silicon space, longer data paths, more power consumption, and slower execution.

While true, this has more to do with the nature of x86 itself. Having a standard way to add new instructions doesn't negate the need to preserve backwards compatibility.

It seems to me that what Agner really wants, or at least, the argument that the points he brings up support, is to replace x86 with a RISC-style ISA. Having a standard way to add new instructions into x86 changes nothing fundamental about the ISA and the pros and cons that go along with it. And truly addressing the issues that Agner raise would require such fundamental changes to the ISA that there'd be no point in calling it "x86" any more at that point.

Of course, I think having standards in place regarding adding extensions to the x86 ISA is a fine idea, but it is definitely not going to fix any of the issues that Agner raised. You'd need to switch to an entirely different ISA to do that.
Agner - Monday, December 28, 2009 - link
Thank you everybody for discussing my proposal.

rs1 wrote:
>-- "The total number of x86 instructions is well above one thousand"
>And if there were a standardized method for adding instructions, then
>there would likely be just as many, if not more.
Please read my original blog post: http://www.agner.org/optimize/blog/read.php?i=25">http://www.agner.org/optimize/blog/read.php?i=25 I have argued for an open committee that could approve new instructions and declare other instructions obsolete. There would be fewer redundant instructions if all vendors were forced to use the same instructions. For example, we wouldn't have FMA3 on Intel and FMA4 on AMD. And we wouldn't have new instructions that are added mainly for marketing reasons.

>It seems to me that what Agner really wants, or at least, the argument that
>the points he brings up support, is to replace x86 with a RISC-style ISA.
No, I never said that. Itanium failed because the market wants backwards compatibility. And the CISC instruction set has the advantage that it takes less space in the instruction cache.

>-- "CPU dispatching ...
>You have to deal with this whether or not there is a standard
>procedure for extending the x86 instruction set.
You would certainly need fewer branches; and it would be easier to test and maintain because all branches could be tested on the same machine.

>-- "the decoding of instructions can be a serious bottleneck, ...
>And again this issue still needs to be dealt with either way.
>Having a standard procedure for adding a new instruction to the
>ISA doesn't mean that the instruction being added is going to be
>any less complex to decode.
The current x86 encoding is the result of a long history of short-sighted patches rather than long-term planning. That is the reason why the decoding is so complicated. We need better planning in the future.

>-- The costs of supporting obsolete instructions is not negligible...
>While true, this has more to do with the nature of x86 itself.
>Having a standard way to add new instructions doesn't negate the
>need to preserve backwards compatibility.
I have argued that it is impossible to remove obsolete instructions in the current situation for marketing reasons. But an open committee would be able to declare that for example the x87 instructions are obsolete and may be replaced by emulation after a number of years.
jabber - Tuesday, December 8, 2009 - link
I gave up caring about them in the late 90's after all the excitiment of MMX transpired into ...well nothing really.
epobirs - Tuesday, December 8, 2009 - link
Not nothing. It made a significant difference for certain types of apps. MMX was what made software decoding of DVD possible before CPUs became otherwise fast enough. The pre-MMX 233 MHz Pentium couldn't do it but the MMX-equipped 233 MHz Pentium could with updated software. This was a big win for selling new PCs offering DVD playback without the expense. It gave a nice boost to plenty of other apps like PhotoShop. If you were a heavy duty user making your living with that app, it was enough to make a new machine very attractive. Back when DSP boards for the Mac costing $5,000 gave a similar boost, I use to have clients who said they'd make up the cost on one major job by getting it done in four days instead of five. Back then, speedier CPUs appeared only at long intervals. By the time MMX was introduced, new spee grades were getting pretty frequent, making it harder to appreciate what SIMD brought to the table.

It didn't set the world on fire but it was a worthwhile addition to the processor. As transistor real estate gets ever cheaper and more compact, it makes very good sense to to create instructions that maximize throughput on frequent operations. Another good example is dedicated silicon in GPUs for offloading video playback operations. The cost for this little bit of chip space is so low it doesn't make sense to bother producing systems without it, even if they might never perform any video playback.
mamisano - Tuesday, December 8, 2009 - link
AMD created the SSEPlus project to help with some of the issues presented in this article.

http://sseplus.sourceforge.net/">http://sseplus.sourceforge.net/

[quote]In March 2008, AMD initiated SSEPlus, an open-source project to help developers write high performing SSE code. The SSEPlus library simplifies SIMD development through optimized emulation of SSE instructions, CPUID wrappers, and fast versions of key SIMD algorithms. SSEPlus is available under the Apache v2.0 license.

Originally created as a core technology in the Framewave open-source library, SSEPlus greatly enhances developer productivity. It provides known-good versions of common SIMD operations with focused platform optimizations. By taking advantage of the optimized emulation, a developer can write algorithms once and compile for multiple target architectures. This feature also allows developers to use future SSE instructions before the actual target hardware is available.[/quote]
redpriest_ - Tuesday, December 8, 2009 - link
Johan, if you support both vendors in separate libraries, the one that isn't being used won't get loaded; it's not like you'll clutter up Icache space since it's incompatible code and will never get executed anyway. The loss is more of a code management headache and a marginal amount of extra disk space.
redpriest_ - Tuesday, December 8, 2009 - link
Also; let me add that frequently unused instructions get implmemented in microcode in successive revisions. While there is a nominal silicon penalty, it is very small. This is more of a verification nightmare for the companies involved in implementing them than anything else.
JohanAnandtech - Tuesday, December 8, 2009 - link
Are you sure that in case of the hypervisor the extra code will not be loaded in RAM anyway? (depends on how it is implemented of course)

Remains the fact that it is extra code that must be checked, extra lines that can contain bugs. There is really no reason why AMD's AMD-V and Intel's VT-x ISAs extensions are different.

And again, let us not forget the whole vmotion/live migration mess. It is not normal that I can not move VMs from an AMD to an Intel server. It is a new form of vendor lock-in. Who is responsible for this is another matter... But a decent agreement on a standarized procedure would do wonders I think. And it would pave the way for fair competition.
azmodean - Monday, December 7, 2009 - link
Some food for thought, if all your "killer apps" weren't closed source, you COULD migrate to a new, more efficient processor architecture.

All the apps I use run on x86, Alpha, ARM, PowerPC, and anything else you care to throw them at, how about yours?
rs1 - Wednesday, December 9, 2009 - link
I doubt it's that simple. The fact that things like Windows and Office and other "killer apps" are closed-source does nothing to stop their owners from compiling the sources for a different CPU architecture, and I'm sure they would do so if it were feasibly doable and if there were a reasonable market for that sort of thing. The problem (if we ignore the "reasonable market" requirement for now) is that it is probably not currently feasible.

Let's start with Windows, since that is a basic prerequisite for most "killer apps" to function without significant modifications to their code. The Windows source code almost definitely contains some sections that are written not in C, and not in C++, but in x86 assembly language. Simply cross-compiling those sections for a different target architecture is unlikely to work, and even if it does work, it is even more unlikely to give correct results. All such sections would need to be re-written in assembly language specific to the desired target architecture (or replaced with an equivalent implementation using a higher-level language, if possible). That kind of work falls well outside of the abilities of the average computer user. It probably falls out of the abilities of most programmers as well, when considering the complexities of an Operating System, and the strictness of the requirements (all ported assembly code needs to remain fully compatible with the rest of the Windows codebase).

And without a full and correct port of Windows, it doesn't matter how many other popular Windows apps are made open-source, as there'd be no platform that could run them without substantial modifications to their source code. Granted, there are likely a handful of programmers skilled enough to port the Windows codebase to a different architecture, and they might even be willing to do so if given the chance. But I'd wager that their efforts would take well over a year to be complete, and would likely be bug-ridden and not fully correct regardless. And in any case, in no way would the average computer user be empowered to migrate to a different CPU architecture simply by having the code for everything be open-source. There are simply more barriers involved than just having access to the source code.
Zan Lynx - Monday, December 14, 2009 - link
Windows has already been ported to other CPU types. Microsoft has an Itanium version of Windows and of course amd64.

Since they have ported to Itanium, I don't think it'll be too difficult for them to port to anything else.
Scali - Tuesday, December 8, 2009 - link
Well, just being open source isn't enough.
The code also needs to be written in a portable fashion. A lot of open source software would initially only compile under 32-bit linux on x86 processors. Porting to 64-bit required a lot of fixing of pointer-related operations... and porting to big endian architectures often isn't as simple as just recompiling the source either.

Another solution to the problem is languages based on virtual machines, such as Java, .NET, or the various scripting languages. They only need a VM for the new architecture, and then your software will work.
I think this is a better solution than open source in most cases, since the sourcecode doesn't have to deal with architecture-specific issues at all, and you won't have endianness or pointer-size related issues, among other things.
Cerb - Tuesday, December 8, 2009 - link
Personally, I like this method. Now that RAM is getting sufficiently cheap for even mobile devices to have 64MB or more, the overhead of such implementations is less of a concern (once 256MB becomes common, it will all but evaporate), and ARM is certainly ready for it (ThumbEE).

Actual performance tends to be fast enough to not worry for most applications (even considering a battery, which will only become a more common issue over time), and with performance statistics gathering and profile-guided optimization added in (as in running on your programs, based on you are regularly using them), you could beat static programs in oft-used branchy code, and reduce various odd performance bottlenecks that static compilers either can't account for, or are not allowed to do.

...we just an entire user space software system that does not allow pointers, yet can use languages that are not soul-sucking enterprise-friendly ones, and do not rely on platforms for such languages (such as Java). Far easier said than done.
JohanAnandtech - Monday, December 7, 2009 - link
I have to agree that Opensource has an advantage when adopting new ISAs, even within the x86 world. The speed at which Linux adopted and used x86 64 bit to it's full potential was very impressive, compared to the Windows world (where 64 bit is still causing troubles on the desktop).

Then again, if you have invested years of your own paid workforce in a software, I don't think it is viable to opensource your software. So for some software, closed source might continue to be the most efficient strategy. And in that case we don't want x86 to go away, but to be more standarized so devs do not have to worry about extra code to debug.
azmodean - Monday, December 7, 2009 - link
While I am an open-source developer, I have my user hat on right now. My point is that the user's ability to migrate to a new architecture is empowered by the use of open source technologies.

I think the ability of the software to migrate in this way is going to be a telling advantage if ARM and specifically TI's OMAP platform continue to appear in more and more high-end ultraportable devices. Now to be fair, only 4 out of the 5 retail OMAP devices I can think of use Linux (N900, Droid, TouchBook and Pandora use Linux, but not the iPhone), but even the holdout iPhone heavily utilizes Open Source software.

Back to the topic at hand though, if the Open Source ecosystem gains enough of a foothold, it's possible that it would allow new architectures to break into some areas of the CPU market. I'm not holding my breath for x86's stranglehold on the desktop/laptop market to go away any time soon, but perhaps we'll have a bit more competition between x86 and ARM at least on the extreme low-power end of the scale.
haplo602 - Tuesday, December 8, 2009 - link
you are forgetting that the CPU is only one part of the system. peripheral device drivers are the major problem for wide OSS adoption.

f.e. I can run linux on my old pa-risc workstation, but only in text console, as the gfx card has no support in linux (and never will have). same for other devices.

OSS can only go so far on its own.

I admired the PPC ISA once. It was a nice piece of work. I work with pa-risc and itanium systems at work and I think they are quite good alternatives. but again the device driver support is an issue. you simply cannot put an nvidia card into an itanium workstation and expect to game on it :-)
AluminumStudios - Tuesday, December 8, 2009 - link
If clean, simple, well maintained instruction sets were really necessary, x86 wouldn't have won and the various dead or near-dead RISC architectures would still be around.

The world wanted backwards compatible as well as features (and prices) that the owners and makers of better architectures couldn't or wouldn't give. So we evolved to the current x86 state. Just like the cost and danger of cutting out every human's appendix is to high to make it practical to do as a matter of course, there's nothing that can be done about x86. Intel and AMD have gotten pretty good at engineering bigger and fatter chips. I'm happy enought without needing that extra 8% of power savings or performance.
Entz - Monday, December 7, 2009 - link
Companies are not going to give up there source code. Too much time and money spent developing it, only to give it to all your competitors for free. This is even more important to middleware vendors, such as game engines.

The better approach would be to have all applications compiled to an intermedite language (i.e. Java / .NET). Then have optimized compilers and libraries built into the OS for specific processors -- Provided by Intel/AMD. Then let them go nuts on x86 instructions. Those can be opensource and maintained by a community.
SixOfSeven - Monday, December 7, 2009 - link
If the instruction set is getting to be too much of a mess, it presents an opening for a processor which implements a subset of this hairball and leaves the rest of the work to the compiler, relying on faster execution, smaller chip area (these days, giving more cores in the same space), etc. to compensate.

If we're at this point, we should see such a processor; if we don't see such a processor, things either aren't so bad or we're missing an opportunity to make a lot of money. Take your pick.

Yes, I realize the article is mostly talking about different instruction sets across the two manufacturers. But the underlying problem, if it is a problem, is the idea that the instruction set is the place to locate new functionality.
Scali - Monday, December 7, 2009 - link
I think we've been at this point for many years... Thing is, everytime such a processor is launched onto the market, it is killed with brute force.
One example is PowerPC... when it was first being used in the early 90s, PowerPC was a good alternative to x86, generally delivering better performance.
However, since Apple/Motorola/IBM didn't have such a large market as Intel/AMD had, they didn't have the same amount of resources to keep improving the CPU at the same rate as x86 did.
A few years ago, Motorola stopped development of the PowerPC altogether... Apple turned to IBM for PowerPCs for a while, but eventually moved to x86.

I think that if PowerPC development had the same resources as x86, it would probably still be ahead of x86 today.
alxx - Wednesday, December 9, 2009 - link
You mean all those millions of Power ISA chips in playstation3 , xbox2 and nintendo wii ?
Plus used in embedded and communications and not to mention also in IBMs power series (Power isa includes IBM power , Power PC and Cell PPE)
zonan4 - Thursday, December 10, 2009 - link
I just wanted to play game... as long it make my game faster i don't care about this... move along people
Scali - Thursday, December 10, 2009 - link
They may be PowerPC CPUs, but they aren't competitive with desktop x86 processors in terms of performance (well, Cell is a special case, but its performance comes from its special parallel design, not from the fact that it uses the PowerPC ISA).

POWER is not PowerPC. I was specifically talking about PowerPC.
Penti - Sunday, December 13, 2009 - link
PowerPC is Power, but more too the point Motorola sold of their cpu tech and fabs, IBM did continue to develop PPC further then Freescale as it's not Motorola who's behind it, the PowerPC 970MP was used for several years after the mac switched in low end System p systems and power blade servers. Fixstars who bought terra soft/YDL do still even sell a PowerPC 970MP workstation. But no need to develop it anymore POWER6 got the VMX/Altivec unit. G5's weren't actually bad. But the move was wise any way. Native windows, same notebook processors. Ended the rivalry pretty much, and was not needed for stuff like classic environment anymore.
Scali - Monday, December 14, 2009 - link
PowerPC is a subset of POWER (the IBM server ISA on which PowerPC was based).
The0ne - Monday, December 7, 2009 - link
Much too many years I actually don't care what's going on anymore to be quite honest. None of use is going to change anything by discussing it here. I think no one can do anything to "solve" this crude until one actually steps up to take the initiative, with a lot of money. Either that or a genius 10 year old kid that created something evolutionary for all to use, free and open-sourced :)

x86 has gotten TOO ingrained and TOO big. I don't think it can be killed, like MS Windows. You would think it makes sense to somehow start from scratch but that's like passing algebra while chewing bubblegum.
Zoomer - Wednesday, December 9, 2009 - link
I think the power architecture still lives as an embedded processor ISA. See freescale power series.
Scali - Monday, December 7, 2009 - link
I think we've been at this point for many years... Thing is, everytime such a processor is launched onto the market, it is killed with brute force.
One example is PowerPC... when it was first being used in the early 90s, PowerPC was a good alternative to x86, generally delivering better performance.
However, since Apple/Motorola/IBM didn't have such a large market as Intel/AMD had, they didn't have the same amount of resources to keep improving the CPU at the same rate as x86 did.
A few years ago, Motorola stopped development of the PowerPC altogether... Apple turned to IBM for PowerPCs for a while, but eventually moved to x86.

I think that if PowerPC development had the same resources as x86, it would probably still be ahead of x86 today.
Scali - Monday, December 7, 2009 - link
I suppose Intel and AMD need to get together and decide what parts of the instructionset can be abandoned.
I think an obvious examples is 3DNow!
There's very little software that uses it, developers have abandoned it in favour of SSE years ago. Intel never supported 3DNow! anyway, so any code with 3DNow! has a workaround.
MMX is also pretty useless since SSE2. With SSE2 you can do the same operations on the SSE registers, without messing up the FPU stack.

16-bit mode can be abandoned aswell... 64-bit OSes don't support 16-bit binaries anymore anyway, might aswell just use software emulation such as dosbox.

Software emulation should be good enough for large parts of the instructionset. Other CPU developers such as Motorola and IBM have been doing it for years.

A nicely 'cleaned up' x86 which only does 64-bit natively and only the instructions that 'make sense' to support natively, such as SSE (perhaps not even x87, or most of it, as SSE2 replaces that aswell, and is preferred anyway in most 64-bit OSes)... that would probably make the CPUs cheaper, smaller, and more efficient.
Some applications may suffer in terms of performance, but that should be easy to fix with a recompile. Without a 'lightweight' CPU there's just nothing forcing a recompile so it never happens.
wetwareinterface - Wednesday, December 9, 2009 - link
"16-bit mode can be abandoned aswell... 64-bit OSes don't support 16-bit binaries anymore anyway, might aswell just use software emulation such as dosbox."

you are confusing 16 bit isa instructions and 16 bit compiled binaries that rely on a 16bit version operating system. the former is an instruction that handles data of no greater than 16 bits in length. the latter is a program that relies on specific api modules from the operating system that simply aren't there anymore.

there are still several 16 bit isa commands that are used even under windows 7 64 bit. why have a number value in your program that will never exceed 10 for instance, take up a memory footprint 4 times larger?
Scali - Thursday, December 10, 2009 - link
I'm not confusing anything. I'm saying that the 16-bit mode can be abandoned. I didn't say anything about 16-bit operands, so I don't know where your confusion comes from.
What I'm saying is this:
16-bit mode is only used during BIOS and the first part of the OS loader.
Since 64-bit OSes don't have a virtual 16-bit mode anymore, you won't actually be using the 16-bit mode anywhere other than during BIOS. With EFI or something like that, you won't need 16-bit mode at all anymore, since you can start in 32-bit mode right away.
Then after a while, 32-bit mode can be dropped aswell, and only 64-bit mode remains. No more need for mode-switching logic, and with only one mode, instruction decoding becomes simpler aswell, since you don't need to take the context of the current mode into account (instructions are encoded slightly differently in the different modes, certain instructions/operands are valid in one mode, but illegal in another, etc).

So I'm talking about something completely different from you. I'm surprised you don't seem to know what 16-bit mode is (the legacy 8086 mode).
Scali - Thursday, December 10, 2009 - link
Aside from that, I think YOU are confusing something.
The small immediate operand encoding is not because of 16 bit instructions, but rather because of sign-extension modes.
Therefore I can encode the '-1' in an instruction like push -1 with just 1 byte, even though it pushes 8 bytes on stack in 64-bit mode.
If you want to use 16-bit instructions in 32-bit or 64-bit mode, you will get a prefix byte in front of your code, which will switch the CPU's instruction decoder to 16-bit mode for the next instruction.
(and the opposite can be done in 16-bit mode... using a prefix to execute 32-bit wide instructions). So 16-bit instructions in a 32-bit/64-bit binary are actually LARGER, not smaller.
ET - Monday, December 7, 2009 - link
Yes, I think it could help, but honestly, like the article says it's a matter of a few percents. A little bigger VM's, a few more transistors, and a few specific developers having to work harder (compiler makers).

ARM is going to take over the market anyway. :)
davepermen - Monday, December 7, 2009 - link
Doubling the amount of instructions means the dispatcher has one step more to do, at most, as it's mostly logarithmic (if even, it can actually be reduced further in parts). and finding out what to do, and then actually doing it, are not equal in work. the dispatcher is a tiny part of the logic. in general, it does not cost much at all.

still, cleaning up the instruction set would be great. i hoped for x64 to be more cleaned up and streamlined than amd designed it, back then. would have been a great first step.
Springfield45 - Sunday, December 6, 2009 - link
This is why I wish people would drop x86 entirely. Backwards compatabillity, while desireable at times, need not be native. Look at the performance increases in the last two generations of CPUs. Modern CPUs could EMULATE the computing environment from two generations ago and STILL be faster than high end processors of the time.
jensend - Sunday, December 6, 2009 - link
I don't see people dropping x86 in the near future, and the cost of emulating all instructions in software is just too great for anybody to go that route. "Hardware accelerated emulation" with a different ISA ala Loongson 3 might prove to be interesting, but I don't think you'll see mainstream processors go that route soon either. But deprecating a vast number of those instructions now and moving them out of hardware later makes a lot of sense, and the idea to take measures to keep the ISA from getting further unnecessarily bloated in the future is a no-brainer.
GourdFreeMan - Sunday, December 6, 2009 - link
If you count all of the x86 instructions from different vendors, and treat uses of different types of source registers as different instructions, there are ~3000 of them. See http://www.nasm.us/doc/nasmdocb.html">http://www.nasm.us/doc/nasmdocb.html

In actual fact, though, even when only counting unique opcodes a large number of instructions are the same -- just treating the data in the source registers as being different sized, breaking up the vector registers differently, or doing the same integer operations in signed and unsigned modes.

Decoding and dispatching is not as hellacious as these numbers might suggest, as most instructions are encoded so their bits actually have meaning as to what functional subset they belong.

I will concede there are many legacy instructions that clutter the instruction space (BCD anyone?). Backwards compatibility (with forward performance improvements) is generally the reason x86 won the processor wars, however...

Frankly, I am surprised Intel and its rivals have cooperated so well in respecting each other’s machine code to date. I could very well see Intel treating the x86 instruction space as its own and charging competitors to add their own proprietary extensions. Those who didn't play ball would have their old processors become incompatible with future generations of the x86 architecture... perhaps there are segments of their cross-licensing agreement with AMD (redacted in the public document) that forbid this?
dgingeri - Tuesday, December 8, 2009 - link
The fact of the matter is that AMD and Intel worked together to make the 16-bit and 32-bit x86 instruction set when the 286 and 386 come out. Both have ownership in that instruction set. Others were allowed to make x86 compatable processors because AMD insisted on an open type license. If Intel alone owned it, you could bet they'd close it to everyone.

This has also led to law suit after lawsuit between AMD and Intel over who could build what processors. When the K6 first came out, Intel tried to sue them to keep them from producing it because it used the same socket as the Pentium. A court stated that they could use it because it was so close to the original 386 interface, of which AMD did have part ownership.

When the Pentium II came out, it used a totally different interface, so AMD couldn't do the same thing again. That's why the Athlon came out with the Alpha based Slot A interface, which was far superior to the PII interface. After that, AMD just kept using far superior interfaces, with Intel playing catchup. Intel may have a faster base chip, but their interfaces and instruction sets have been behind AMD for years.
GourdFreeMan - Tuesday, December 8, 2009 - link
Do you have a source I can read about AMD's input into the 16-bit and 32-bit extensions of x86? My memory is a bit fuzzy going that far back, and all I can recall from that era is IBM requiring a second source for x86 chips and Intel licensing AMD to produce clones.
tygrus - Monday, December 7, 2009 - link
It would be interesting to see a comparison of the # of instructions in each instruction set (RISC vs x86).

Having memory addresses (or indirect) as sources makes a messy ISA and implementation. Explicitly load into register then calculations use up to three registers is much better (RISC). Only loads, stores and jumps use memory addresses. Old x87 FP stack was horrible.

Could Intel or AMD resurrect Alpha or design new RISC with sensible vector extension.

Could AMD create a CPU that started executing both branches without commit and discard the result of the wrong execution branch. Like Hyperthreading but the threads become a clone.
GourdFreeMan - Tuesday, December 8, 2009 - link
"Having memory addresses (or indirect) as sources makes a messy ISA and implementation. Explicitly load into register then calculations use up to three registers is much better (RISC). Only loads, stores and jumps use memory addresses. Old x87 FP stack was horrible."

Everything is a trade-off. Consider the cache footprint and the absolute instruction length of the machine code for both architectures. There are no absolute wins, except in the rose-colored world of academia. I will concede the x87 FP stack was simply a dinosaur from a previous age of microprocessors, however.

"Could Intel or AMD resurrect Alpha or design new RISC with sensible vector extension."

You know, with a clean-slate design you can do anything... except be the dominate microarchitecture in the computing world. More seriously, the only market for new architectures is the HPC domain of supercomputers... which is probably much better served by research into heterogeneous computing systems with a small number of complex cores that handle branchy code augmented by a larger number of simpler cores that do fast vector processing (e.g. Cell, GPCPU, etc.).
GourdFreeMan - Tuesday, December 8, 2009 - link
Whoops... spelling error. Change "dominate" to "dominant" in my previous post.
titan7 - Monday, December 7, 2009 - link
When Apple moved from CISC 680x0 to RISC PowerPC they actually had MORE instructions than before. So don't get too hung up on the absolute instruction count.

Branch Prediction is on average no worse than 50% correct (if it guessed randomly), but often well above 90%. Doing both branches would mean 2x the power use for as little as 5% more speed overall.
Zool - Wednesday, December 9, 2009 - link
"Branch Prediction is on average no worse than 50% correct (if it guessed randomly), but often well above 90%. Doing both branches would mean 2x the power use for as little as 5% more speed overall."
Than my question would be why branch prediction,speculation on today AMD adn Intel cpu-s takes so much space from the core die. For the 5% more speed overal ?
The thing is that with 15 and more stage super-scalar pipelines the penalty is much more than 5%. It depend on how much branches the code actualy contains. But they cant count on this and need to make the performance balanced in both cases. IBM power5 and 6 have 14 stage pipelines, Intel Nehalem 16 stage pipeline and amd phenom (i could find it only for opteron which is the same) 12 stage integer/17 stage floating point.
Doing both branches doesnt seem such a bad idea if u would have a very simple core and you could forget branch prediction/speculation.
jensend - Sunday, December 6, 2009 - link
If they did that they'd have at least $100 billion in antitrust fines rather than $1 billion.
Shining Arcanine - Sunday, December 6, 2009 - link
Just don't buy AMD processors until they abandon their extensions in favor of Intel's extensions. Problem solved.

There is absolutely no reason for AMD to make proprietary extensions to x86 that contradict Intel's extensions. Intel made x86 and whatever competitive advantage there is to doing so is negated by the fact that they have so little market share that no one cares about optimizing for their hardware.
Targon - Saturday, January 9, 2010 - link
Some basic facts, since you seem to have missed several generations worth of processor development:

Intel started trying to make AMD processors incompatible with certain applications by adding SSE. AMD responded with 3DNow. As time went on, Intel stuck with the idea of trying to make AMD processors not run certain applications or not run them well with new instructions over time, while AMD really didn't do it beyond the 3DNow! set.

The move from 32 bit to 64 bit processors in the home market is ALL due to AMD adding 64 bit instructions to the set of 32 bit instructions of the time. This was not a case of trying to make some useless set of instructions, but was a true desire to bring 64 bit processing to the masses while providing improved performance in 32 bit applications. If you want to kill all AMD extensions, then you kill 64 bit support since Intel copied AMD instructions. Intel 64 bit is Itanium, which is a failed platform, even if there are a handful of systems running it.

You can't blame AMD for the useless extra instructions when Intel is to blame.
Exophase - Sunday, January 10, 2010 - link
Many years ago, long before any of the "media extension" instruction sets came to be, a legal agreement was reached between Intel, AMD, and other x86 manufacturers that allowed them to freely implement any instruction set changes that the others made.

Intel didn't make SSE to try to make AMD processors incompatible. You have the order backwards anyway - 3DNow! came first with AMD K6-2, while SSE wasn't available until the later released Pentium 3. Intel went with SSE instead of 3DNow! because it's a less limited design, not because they wanted to split the market. This is indicated by the fact that AMD eventually moved to SSE support instead of 3DNow!.

I don't think any of the extensions are useless, although they might appear that way to people who don't have particular use for them. They're added because Intel or AMD believes that enough people will benefit from them, and this belief is usually based on programmer feedback. If you look at the instructions and possibly do a little research then it's not hard to see applications where they would prove beneficial. That doesn't make the decision justified in the long run, but I don't think that they're keen on adding expensive execution functionality to their cores just so they can have something to advertise. If that were their angle they'd probably just start making things up.
Targon - Saturday, January 9, 2010 - link
Some basic facts, since you seem to have missed several generations worth of processor development:

Intel started trying to make AMD processors incompatible with certain applications by adding SSE. AMD responded with 3DNow. As time went on, Intel stuck with the idea of trying to make AMD processors not run certain applications or not run them well with new instructions over time, while AMD really didn't do it beyond the 3DNow! set.

The move from 32 bit to 64 bit processors in the home market is ALL due to AMD adding 64 bit instructions to the set of 32 bit instructions of the time. This was not a case of trying to make some useless set of instructions, but was a true desire to bring 64 bit processing to the masses while providing improved performance in 32 bit applications. If you want to kill all AMD extensions, then you kill 64 bit support since Intel copied AMD instructions. Intel 64 bit is Itanium, which is a failed platform, even if there are a handful of systems running it.

You can't blame AMD for the useless extra instructions when Intel is to blame.
WaltC - Friday, December 11, 2009 - link
/[Just don't buy AMD processors until they abandon their extensions in favor of Intel's extensions. Problem solved.]/

Or we could solve it by not buying Intel cpus until Intel decided to go with 100% AMD instruction extensions (I actually haven't bought an Intel cpu since 1999, btw.) To some extent, that's actually what happened with Core 2 64-bit Intel x86 cpus, isn't it? AMD's allowed them to use x86-64 all these years, and since Intel threw in the towel and just wrote AMD a $1.25B check, their new cross-licensing agreement provides Intel with at least 5 more years of x86-64 utilization in its cpus.

I don't think that "not buying" either company's cpus is any kind of a solution, seriously. Most people aren't going to do that because most people don't care what brand of cpu they buy in their box--they're buying box brand and price, primarily, and don't know or care about the differences in x86 cpus inside.

I sympathize with the programmer's point of view, here--I really do. Standardizing instructions certainly would make things simpler for the programmer. However, I'm also a firm believer in competition, and two heads are always better than one, imo. x86-64 was 100% AMD's invention, and Intel had to pick it up because it was so successful. OTOH, there've been Intel instruction set extensions which AMD has picked up for the same reason. So for all intents and purposes, there are extraneous x86 extensions made by both companies which programmers should pretty much ignore--unless they want to specialize for a particular cpu--which means they'll be limiting themselves to a smaller market--which means they probably won't do it.

I think that if both companies "agreed" on a particular set of extensions then it would limit innovation and future product improvement and introduce a lot of stagnation into cpu development. It would surely simplify things for programmers, but it would also slow down product R&D.

The problem here is we've got two distinct viewpoints: the cpu manufacturers' and the programmers', and they aren't necessarily the same at all. Conflicts like this are inevitable in a competitive cpu market. It isn't Intel versus AMD that we are really talking about, it's Intel and AMD versus programmers who naturally would prefer to have everything much simpler...;)
Calin - Tuesday, December 8, 2009 - link
Yes, AMD should no longer use the AMD64, 64-bit instructions and instead go with Intel's 64-bit instructions...
...wait, the Intel 64-bit instructions are AMD's 64-bit instructions
piroroadkill - Monday, December 7, 2009 - link
Oh, so I guess we wouldn't be using AMD64 then?

Shut the hell up.
Shining Arcanine - Monday, December 7, 2009 - link
It is now known as Intel E64MT.

Stop being a fanboy.
Griswold - Tuesday, December 15, 2009 - link
First of all, it used to be called EM64T. Now Intel calls it Intel64.

However, literally everyone calls it AMD64. Linux distros refer to it as AMD64, even Microsoft does so.

So, before you call somebody a fanboy, you should stop being a fanboy and get your facts straight. Makes it less embarassing for you.
Scali - Tuesday, December 15, 2009 - link
There are two sides to this story.
Developers tend to call it 'AMD64' because that is the original name that AMD used.
Hence, when you browse through folders, you'll often find AMD64 in filenames and directory names.

However, the problem is that people who are less familiar with hardware won't understand that their Intel processor can run AMD64 code. It can be rather confusing. Hence, Microsoft uses x64 in product names and marketing material. It is a simple name, looks like the x86 which people are already familiar with, and doesn't have a direct link to any brand.
Microsoft would probably just have used '64', but they already used that for Itanium products, so x64 is there to distinguish x86 from Itanium.
piroroadkill - Tuesday, December 8, 2009 - link
Woah, I'm not a fanboy at all, I have an Intel system, infact, the last AMD processor I bought was a K6-2, but it's unavoidable to say that AMD invented the 64 bit x86 extensions we use today.

"AMD licensed its x86-64 design to Intel, where it is marketed under the name Intel 64 (formerly EM64T)."

So please, get your facts right.
johnsonx - Tuesday, December 8, 2009 - link
Intel actually calls it EM64T. Anywhere outside of Intel, it's called AMD64. Fanboi.
bersl2 - Tuesday, December 8, 2009 - link
Everybody has different names for it. x86-64. x86_64. x64 (BTW, I want the person who came up with that particular abomination taken out back and shot).

Or better yet, stop the foolishness and just call it "64-bit x86". Everybody will know what you mean. Nobody will be offended.

Or we could just switch to a *sane* instruction set. I almost don't care which one.
piroroadkill - Tuesday, December 8, 2009 - link
Mostly AMD64, though:

BSD systems such as FreeBSD, NetBSD and OpenBSD refer to both AMD64 and Intel 64 under the architecture name "amd64".

Debian, Ubuntu, and Gentoo refer to both AMD64 and Intel 64 under the architecture name "amd64".

Java Development Kit (JDK): The name "amd64" is used in directory names containing x86-64 files.

Microsoft Windows: x64 versions of Windows use the AMD64 moniker... ...For example, the system folder on a Windows x64 Edition installation CD-ROM is named "AMD64"...

Solaris: The "isalist" command in Sun's Solaris operating system identifies both AMD64- and Intel 64–based systems as "amd64".
npaladin2000 - Sunday, December 6, 2009 - link

If AMD abandons all AMD created extensions, say good-bye to the extension that is x64, since AMD is the one that created it and not Intel. In fact, it was specifically created to contradict Intel's Itanium. We were very happy about that beause Itanium stunk so bad at running x86 code.

Maybe Intel should abandon all Intel-created extensions for AMD ones because AMD made x64?

To some degree, you HAVE to have these guys competing, so we get to decide between the two of them (hence we now have x86-64 instead of Intel's nightmarish Itanium). Otherwise Intel makes all decisions, and we'd probably still trying to choose between x86 NetBurst and Itanium...which is kind of like trying to decide between being being beaten with a hammer or a baseball bat.
wetwareinterface - Tuesday, December 8, 2009 - link
the x86-64 extensions by AMD were quite good and also superior to the extensions Intel created later.

however Itanium was not a bad cpu. far from it, it was faster running tasks than any other x86 cpu, sparc, power 4 then later 5, etc...

Itanium was a good product, it was held back by a lack of software to run on it as it was a completely new isa and only did x86 at all to maintain some backwards compatibility for orginizations who might need it. look at the old top 500 lists and where Itanium sat as a single cpu in benchmarks and that was only at 1GHz. if Itanium had gained any traction at all software would have been written for it specifically and Intel would have invested more R&D resources to mainstream it and we'd be seeing 3.4 GHz Itanium quad cores now. Itanium was a simple efort to do exactly what the original proponent of x-86 wants, to clean up the mess. backwards compatibility is the problem with x-86 right now, as a cpu and as a platform. physical irq's being limited to 16 (actually 15 because of another backwards compatibility issue) would not be a reality if we could ditch some crap baggage away from x-86. yes logical assignment by the operating system is the norm now but imagne what we could do with a much larger irq range alone. let alone a revamped floating point instruction set that doesn't have to carry the baggage that makes the current x-86 floatng point instructions a joke.
ThaHeretic - Tuesday, December 8, 2009 - link
So IA64's specific weakness was that they (HP/Intel) assumed it was easier to predicate logic in software than it was in hardware. What the learned was that this is not the case; it's just hard anywhere you try. You can only predicate so many branches in advance before you run out of functional units to matter how wide your architecture is and it requires explicit knowledge and tuning of the software/binary/compile process to account for this hardware. The need to recompilation for optimal performance is heavy, and even Intel who has arguable the best compiler optimizers out there, have had great difficulty generating awesome binaries.

EPIC (Explicitly Parallel Instruction Computing) isn't even new, it's just a rebrand of VLIW (Very Long Instruction Word) which in all previous incarnations ultimately failed and earned a bad reputation. ie VAX went the way of the doodoo. Itanium is good in a very, very small niche market: multi-exabyte databanks.

IA64 wasn't never meant to "clean up the IA32" mess, it was meant to address a totally different market. AMD64 (x86-64) was meant to clean up the IA32 (x86) mess to a large extend. A lot of old stuff was removed from long mode, system specific stuff. Plus the x87 floating point stack was made obsolete but guaranteed inclusion of SSE1&2. Plus a doubling of the registers, etc. IA64 was always something totally different, never meant to replace IA32.
bsoft16384 - Friday, December 11, 2009 - link
Well, the problem is the assumption that predication is the solution to branch performance issues at all. The reality is that most branches are predictable enough that predication doesn't really buy you much. It's only in the situation when you have a highly unpredictable branch that branch prediction really breaks down, and then predication starts to be much more useful.

Note that there are some predicated instructions on x86 as well, but not anywhere near the same scope as on Itanium.

It's not quite correct that EPIC isn't new. EPIC is very VLIW-like, but it solves a number of VLIW problems (e.g. how to keep software compatibility for future CPU generations that are wider).

The bottom line is that we've basically run out of ILP for most code, at least with current research. Increasing the instruction window doesn't get you much, and increasing the issue width doesn't get you much either.

VLIW/EPIC works really well on some programs, but the bottom line is that the magic compilers that make VLIW/EPIC "better" than an out-of-order multiple-issue design don't really exist. ICC and other good compilers show us that VLIW/EPIC is better some of the time, on some code. In other cases, it's considerably worse.

I know a lot of people who worked on Itanium (I grew up in Fort Collins, where the HP design team worked) and I remember the rhetoric well. Itanium was NOT just a mainframe CPU. Itanium was going to replace HP-PA (Itanium is in many ways very PA-RISC like), it was going to replace other RISC architectures, and eventually it was supposed to replace x86. It was supposed obe a server architecture, a workstation architecture, and eventually a desktop/mobile architecture.

Many, many people at HP believed the rhetoric. Was that because they were naive or stupid? No. It's because VLIW (and by extension EPIC) always looks better on paper than it is in practice. VLIW allows you to have wider designs with less logic since you use far fewer resources on resolving instruction dependencies. The reality is that dependencies are sometimes very hard to resolve at compile time. The reality is that some code just doesn't have that much ILP to begin with. The reality is that code is often memory bottle-necked anyway.

In a way, Itanium was like the Pentium 4. Both are brilliant on paper, and both perform more poorly in practice. The great irony is that Intel decided to push for more parallelism in one design (Itanium) and less in another (Pentium 4). Itanium was supposed to be faster because it was wider and therefore did more per clock. P4 was supposed to be faster because very high clocks would make up for lower IPC.

The reality is that neither extreme really works.

P4 ran out of gas because the process technology simply couldn't make a 10GHz P4. Architecturally, P4 was (and is) capable of very high clocks; P4 still has the clock records at over 8GHz. But a CPU needs to be manufactured, and leakage current (and other factors) prevented an 8GHz P4 from being practical.

Itanium ran out of gas because you can only get so much from ILP. Itanium is wide, has more registers than you could ever want, and has huge caches. It's a 2+ billion transistor (1B+ per core) monstrosity, more than 3x as many as Lynnfield (i7) per core. Despite all the hype, Itanium didn't end up being simpler than out-of-order CPUs, and it didn't end up being dramatically faster per clock (except on certain applications).

Are these faults of the IA-64 architecture, of the Itanium design, of Intel's manufacturing, or of the compilers and software? We'll probably never really know for sure. But we do know that CPU design is about making trade-offs, and that designs that look good on paper often perform poorly in practice.
alxx - Wednesday, December 9, 2009 - link
Sorry your a bit wrong there
VLIW is still heavily used by TI in their dsp cores.
Look at their C6000 series and C6400+ , also in the dsp unit in their OMAP cores used in a lot of mobile phones and in the dsps used in some base stations and a lot of other comms equipment.

A more correct statement would be vliw failed in general purpose computing.

http://www.eetasia.com/ART_8800445205_499489_NP_cb...">http://www.eetasia.com/ART_8800445205_499489_NP_cb...
http://www.ece.umass.edu/ece/koren/architecture/VL...">http://www.ece.umass.edu/ece/koren/architecture/VL...
http://focus.ti.com/paramsearch/docs/parametricsea...">http://focus.ti.com/paramsearch/docs/pa...ilyId=13...

Interesting book
Embedded computing. A VLIW approach to architecture, compilers & tools
wetwareinterface - Wednesday, December 9, 2009 - link
You have missed the IA64 mark by a longshot. IA64 doesn't predicate logic in software, it allowed software to handle it's own data and instruction width more efficiently. For instance you have to compare 2 16 bit values and fetch a 32 bit float. On x86 with no dependencies thats a lot of operations, 2 fetch's for the 16 bit values, a compare, then at least 2 stores (because of a serious lack of registers) then another fetch. In IA64 it can do all three fetches at once then store locally in a register the result on the 16 bit compare. That's just one case. There are several instances where IA64 simply kicks the crap out of x86 for doing what cpus do. The vliw is a means to an end, in VAX's case there wasn't enough resources behind the concept to make it worthwhile. In IA64's there is an aboundance of cpu horsepower to handle the concept of vliw. The compiler just has the ability to pack more fetches together if it can and do the job of the cpu ahead of time in organizing dependencies in some cases. The dependency resolve in the compiler was a bonus in the compiler to save even further cpu cycles on IA64 code and was neccesary due to the software x86 emulation. It was only required for x86 emulation because Intel wanted to junk x86 entirely. In any system there is a lot of non dependant data being fetched. The problem with x86 is you can't get too much ahead of time because of a lack of resources in the cpu and not many means to grab more at once. You can fetch to level 1 or lvl 2 cache in 64 bit chunks but because of the crap isa of x86 taking them into the alu or registers is a one after the other step. IA64 sought to get rid of the limitations of x86 and go forward with a 64 bit isa that was new.

Motorola/IBM/Apple did the exact same thing moving to Power PC, and it worked well for them. It meant slow software emulation for older code but a dramatic increase in new code and a new more modern isa without a lot of garbage they didn't need anymore. Intel was trying to do the same thing only they didn't have the partner in Microsoft that Motorola and IBM did in Apple. Meaning one focused on the mainstream desktop and willing to completely ditch legacy code and start over with a new cpu instruction set. Microsoft had a massivly larger user base and it was extremely varied and couldn't just drop everything the way Apple could.

HP on the other hand in the server space could devote a seperate effort to IA64. For HPC IA64 kicked the crap out of everything that then existed under HP-UX on a per cpu basis. The isa was very good even running at a MHz handicap. It took IBM going to Power 5 and ramping up the Ghz and Intel not updating IA64 due to spending their resources on Core2 to finally beat it. Make no mistake Itanium was a monster even at low clock speed. It just didn't get any software to run on it's own isa except in a few instances and those for HPC or server roles. You can't compare what IA64 can do with desktop centric performance benchmarks because you aren't running any IA64 code at all. You are running a cross isa emulator. And give Intel some credit on their jit compiler because it rocked. It took a completely foreign instruction set and ran at nearly the same speed as the cpus it was designed to run on, but on a foreign cpu to the code. People complained about the speed of IA64 running office and similar x86 apps under emulation as being like an older generation x86 cpu. Try running Pear PC (a Power PC emulator) and just time the install of OS8 even today on a core i7 920 overclocked to 4GHz and tell me how bad Intel's Itanium was at x86 emulation.

Lack of software on IA64 is what killed IA64 not the isa.

Also it was actually Intel's intent to transition to IA64's isa for the mainstream. First was server, then workstation Xeon motherboards would take either IA64 or x86 Xeons. Then the mainstream parts would come after. AMD threw the monkey wrench in the whole Xeon Itanium/x86 transition with the Opteron/x86-64 move.
mgambrell - Friday, December 11, 2009 - link
I just want to clarify something here. Apple's handful of toady developers can be pushed around, but Microsoft doesnt have that clout over their hundreds of thousands of developers. It isn't even possible. I enjoy watch them try just to kick people off XP, and you think you could get them to ditch x86? Ha.
cbemerine - Wednesday, December 30, 2009 - link
"...but Microsoft doesn't have that clout over their hundreds of thousands of developers. It isn't even possible. I enjoy watch them try just to kick people off XP..."

I do not know what planet you are living on, but they most certainly do have the clout to push every XP user off of it. While via the developers is one minor path; over the last 20+ years Microsoft has been more successful kicking people off older platform via the following methods: Hardware (Intel, Nvidia and others); Software (Corel, Novell and others); BIOS vendors: (all but Coreboot); and of course their own forced auto-updates and auto-upgrade process.

Its total vendor lock-in and has been so since mid way through Windows 2000. The only way out is not to play...Linux, Unix or Mac OSX.

Your delusional to ignore past abuses and facts, though you are hardly alone.

My preferred method is to set a "7 Year Clock"; if after 7 years of actions on the part of Microsoft and those they influence, they are being a good corporate citizen and leaving FUD vendor lock-in tactics behind...based on their ACTIONS, not words...than and only than will I purchase their products. When a vendor causes problems with software/hardware I am running, I do not blame the software, but THAT VENDOR! It really is that simple.
yuhong - Sunday, December 6, 2009 - link
They already did abandon their own SSE5 in favor of AVX.
psychobriggsy - Sunday, December 6, 2009 - link
What about when AMD do the extensions first, and Intel does something different?

Examples: 3DNow! and AMD's Virtualisation instructions (which were more functional than Intel's, at least early on).

The sad thing is that it is the broken x86 architecture itself that requires special virtualisation instructions to be present.

I say that in around 2015 32-bit compatibility x86 should be relegated to a separate 32-bit core in the CPU for all backward compatibility, and the main CPU cores should be 64-bit only, no backwards compatibility, maybe even have the ISA tweaked to account for this (64-bit instruction prefixes not required, for example).
darthscsi - Monday, December 7, 2009 - link
Intel once thought as you did, and created a processor with a 64bit instruction set which was incompatible with x86. They wound up with a seperate execution unit for x86 initially but now have dropped that in favor of binary emulation in SW. But you don't run an Itanium do you? You have a processor with extensive backwards comparability. You want a cleaner ISA? Vote with your dollars. (Yes I've had several Alphas and have been sad to see that ISA die).
Lucky Stripes 99 - Thursday, December 17, 2009 - link
Keep in mind that one major benefits of a CISC based instruction set is that you can theoretically achieve a greater code density than a RISC processor.

Look at ARM as an example. You need at least one 32-bit op to fetch and one 32-bit op to work the data. Under M68K, the whole thing can be done with a single 48-bit op. More complex forms of indirect addressing may require several more ops for ARM in order to get your offset. Under M68K, the offset is just added to the single op.

Sure, the ARM solution makes the prefetch and execution circuits much, much easier to implement. However, you end up taking a byte or two of overhead for each instruction versus the M68K. For IA32 which uses an even denser instruction set, the savings can be even greater.
Scali - Sunday, December 20, 2009 - link
I don't think that's a benefit anymore. These days memory and cache are relatively cheap. It's much easier to slap a few extra MB onto a system than it is to improve its performance per instruction.
Shadowmaster625 - Monday, December 7, 2009 - link
He's talking about having a dedicated x86 core to maintain backwards compatibility. This is a no-brainer. New multicore CPU's should only have one or two legacy cores, the rest should be more efficiently designed. I'm sure this will happen eventually, as soon as it becomes cheaper to design multicore CPUs is such an asymmetrical manner.
MonkeyPaw - Tuesday, December 8, 2009 - link
Possibly like Fusion and OpenCL? Once GPUs come onto the CPU die and become standard, maybe we can see some processes move to a much cleaner ISA?
phaxmohdem - Wednesday, December 9, 2009 - link
I don't know much about what exactly goes on at the instruction set level, but Combining the functions of GPU and CPU seems to me like it would add further complexity to the ISA. The processor would need some way to discern which instructions are meant to be dispatched to the GPU shader coreds, and which need to be sent to the regular CPU cores... Then it needs some rules for what to do with the data after it comes out of either the GPU or CPU pipeline.

Bottom line, is that while this utopian vision of a single ISA, unable to be modified by individual companies like Intel or AMD without consent, would perhaps improve things a little in the short run, lack of competition and the incentive to add something useful to your processor to set it apart would be detrimental to progress in the industry in the long run.
wolfman3k5 - Monday, December 7, 2009 - link
If you're talking about x86-64 abandoning 32bit support, then you're clueless. x86-64 is an extension of the 32bit instruction set. What you're saying here has been thought of by AMD when they designed x86-64.

As for the addition of proprietary x86 instructions, wasn't AMD the company that added 64 bit instructions to the x86 instruction set? Weren't they the ones who created AMD64 or x86-64 as it's referred to? That little proprietary instruction set is what's allowing every Joe Six pack to be able to use more than 4GB of RAM on their desktop.
darthscsi - Monday, December 7, 2009 - link
No, AMD64 did not allow more than 4GB of ram, Physical Address Extension (PAE) did. PAE was introduced in the Pentium-Pro and allows more than 4GB of physical memory in 32-bit mode. There is no technical requirement that the virtually addressable memory is the same size as the physically addressable memory. In PAE, the page tables map 32 bit virtual addresses to 36 bit physical addresses. Microsoft chooses not to enable this in consumer OSes (but does in enable it in server OS builds). 32 bit x86 has supported 64 GB of ram with 4 GB process virtual address space for a long time.

http://en.wikipedia.org/wiki/Physical_Address_Exte...">http://en.wikipedia.org/wiki/Physical_Address_Exte...
GIBson3 - Monday, December 7, 2009 - link
You aren't 100% correct in your point about moving to a 36 bit addressable space. Any IA-32 (aka x86) program is able to address a Maximum of 32 bit's worth of memory, that's still a 4 gigabyte limit. The operating system has the ability to "page" programs to different 4gig blocks under PAE. The x86_64 extension set (pioneered by AMD, and later duplicated by Intel) enables a (current) Maximum of 48 bit's addressable in Virtual(that's 256 Tebibytes) and 40 bits in physical (that's 1 Tebibyte) address spaces, both of with can be pushed to 64/52 bits respectively. While PAE was the foundation on which x86_64's memory addressing system was based, when AMD was making the push for x86_64 Intel was taking a line of 64 bit isn't necessary in the home user space.

The standardization of the x86 instruction set makes a lot of sense, it would allow things like AMD64/EM64T to happen faster and more "evenly" instead of this 5 year battle between the two major x86 producers. While Intel may have created x86, they have certainly done their fair share of Messing with it. From a design standpoint the x86 instruction set is muddy compared to others, and really shouldn't have come out on top against others such as SPARC, ALPHA, Etc.

As for your point about Microsoft choosing not to enable it in consumer based OS's, look also at the limitations in place on the commercial OS's not all of those support "36 bit PAE" (all NT based os's support PAE, it's a question of whether or not they use the 36 bits instead of just 32). it's an artificial selling point on the part of Microsoft, there is no reason to limit consumer 32 bit OS's to 4 gigs of ram except to place a "premium" on the ability to use more than that.
lemonadesoda - Sunday, March 21, 2010 - link
There is so much rubbish here in this thread it is embarrassing. Do you guys think that an 8 bit processor could only handle 256 bytes of memory? Of course not.

The width of the execution registers and the bit width of the memory, execution, and stack pointers have nothing to do with each other. There is no reason that a 16 or 32 bit processor cant have 64 bit memory register/pointers if it was designed that way.

The problem is that a processor DESIGNED with only 16bit, 24bit, 32bit or 40bit memory pointers cannot just be swapped out with a 64bit memory pointer edition version and still be compatible. All the machine code fails. THAT is why page addressing and x86 extensions have been used.

Intel and Microsoft have played with other microprocessor architectures... but they have never really caught on. And for the consumer, no matter how ugly x86 is, it works and the whole hardware and software industry is built around it. Changing
that architecture is going to require a lot of bravery.
Calin - Saturday, December 12, 2009 - link
The official reason for PAE limit of 4 GB on desktop operating systems are the drivers. Making drivers that work correctly with PAE on more than 32 bits is a bit more difficult.
misium - Friday, January 8, 2010 - link
The official reason for PAE limit are not the drivers but the fact that PAE is still 32 bit architecture and thus uses the same 32-bit compilers with the same 32-bit pointers. 32-bit pointers mean 4GB of address space per process and thats it.
ThaHeretic - Tuesday, December 8, 2009 - link
@GIBson3: Yea mostly right; good post. PAE allowed the OS to address 36-bits of physical memory by expanding the physical address register from 32 to 64 bits of width, but each process is still limited to 4GB unless they setup some sort of memory file or block-like access.

There used to be a more significant performance hit for having PAE-enabled, particularly with TLB efficiency--with PAE you get half as many pagetable entries fit inside of your TLB--but that performance hit his become practically negligible with mordern TLBs especially when using larger page sizes (ie hugepages).

The OS code to setup non-PAE memory is simpler and pagetable entries are smaller, but meh. If you're running in 64-bit mode (x64 whatever), you're using PAE to setup your pagetables, so they've gone to great lengths to negate the 64-bit hit.

Also, IA32 was rather advanced for virtualization. It supported 4 rings of execution back in the 70s and 80s, and supported all ISA level features needed for visualization very, very early on in the academic discovery of virtualization. AMD64 restricted long mode (64-bit) execution to 2 rings of execution, which meant it did not provide sufficient and necessary conditions for virtualization, though in AMD's cause they had an IOMMU with fencing that helped work around this. Anyway, that's one of the key facets of both Intel's and AMD's virtualization extension, they add another ring of execution (-1 if you will) for hypervisor execution.
Lucky Stripes 99 - Thursday, December 17, 2009 - link
The four protection rings found in IA32 have nothing to do with virtualization (in the traditional sense). They are a form of security domain, not unlike the access control methods for pages or segments on processors with a full memory management unit.

Furthermore, the IA32 instruction set has numerous difficulties with regards to virtualization. It traditionally fails to meet the Popek and Goldberg requirements for virtualization due to a number of unprivileged instructions that can modify sensative status registers, interrupt registers and the stack.

AMD-V and Intel VT are supposed to restrict those instructions in addition to it ability to run a hypervisor with ring privilege mode -1.
ThaHeretic - Tuesday, December 8, 2009 - link
Eh I mistyped. I mean PAE expanded the physical address ENTRY (not register) in the pagetable from 32 to 64-bits. Because of this doubling of width, this meant that pagetables with the same number of entries occupy twice the space, and thus TLB's can only cache half as many entries.
JHBoricua - Monday, December 7, 2009 - link
Umm, PAE is essentially a hack that comes with a penalty hit. Even though MS enabled its use in their Server Os line, only a very few number of applications can take advantage of it (SQL 200x comes to mind).

The poster is right in that the 64-bit extensions AMD introduced to the x86 architecture paved the way for both Server and Desktop Operating Systems to be able to natively address >4GB of RAM. Not to mention that it paved the way for a greater number of applications to be developed to run natively on 64-bit x86.
iwodo - Sunday, December 6, 2009 - link
We need an MUCH cleaned up of X86.
I am sure Apple will be one of those that is interested.
nnitklin - Sunday, January 17, 2010 - link
The new year approaching, click in. Let's facelift bar!
===== h t t p : / / 0 8 4 5 . c o m / N 3 u ====
jewerly $20
ugg boots$50
jordan shoes$32
handbag$35
===== h t t p: / / 0 8 4 5. c o m /N 3 u ====

_+++++++_+_+_+_+_+__
__+++__++++
MrPoletski - Wednesday, December 9, 2009 - link
IMHO,

The best way to go about it is to start phasing out old instructions, but doing it in a way that any program that now crashes because they are gone can be run up in a VM environment that emulates them.

With over a thousand instructions there will be serious overlap too, so start amalgamating similar instructions into one, again maintaining the VM environment that can emulate them.

I.e, move the decoding of old hat instructions into software and re-organise the instruction set.

Do it over the next 3 gens of processor.

Should work out ok.
nnitklin - Sunday, January 17, 2010 - link
The new year approaching, click in. Let's facelift bar!
===== h t t p : / / 0 8 4 5 . c o m / N 3 u ====
jewerly $20
ugg boots$50
jordan shoes$32
handbag$35
===== h t t p: / / 0 8 4 5. c o m /N 3 u ====

_+++++++_+_+_+_+_+__
__+++__++++
Lucky Stripes 99 - Wednesday, December 16, 2009 - link
Motorola did this back in the days of the 68xxx series. Anyone with an Amiga remember the 68040.library?

Whenever a program attempted to issue an instruction that was legal on a 68020 or 68030 but was illegal on the 68040, it generated a trap. The 68040.library contained various handlers that would them emulate the retired instruction via software emulation.

You could do this today using a TSR under DOS or a kernel module under Windows, BSD or Linux.
Scali - Sunday, December 20, 2009 - link
Yup. And Motorola isn't the only one.
IBM also moved part of their POWER instructionset from hardware to software in much the same way.
It actually worked reasonably well on Amiga. I recall that even certain variations of mul were software-emulated.

the x86 instruction proprietary extensions: a waste of time, money and energy

Post Your Comment

108 Comments

Back to Article

wavebossa - Friday, January 22, 2010 - link

bcronce - Sunday, January 10, 2010 - link

HollyDOL - Wednesday, December 9, 2009 - link

- Thursday, December 10, 2009 - link

RadnorHarkonnen - Friday, December 11, 2009 - link

ProDigit - Wednesday, December 9, 2009 - link

Scali - Thursday, December 10, 2009 - link

cbemerine - Wednesday, December 30, 2009 - link

Scali - Monday, January 4, 2010 - link

Penti - Saturday, December 12, 2009 - link

Scali - Monday, December 14, 2009 - link

Penti - Monday, December 14, 2009 - link

Scali - Tuesday, December 15, 2009 - link

Penti - Tuesday, December 15, 2009 - link

Penti - Tuesday, December 15, 2009 - link

Scali - Wednesday, December 16, 2009 - link

ProDigit - Wednesday, December 9, 2009 - link

rs1 - Tuesday, December 8, 2009 - link

Agner - Monday, December 28, 2009 - link

jabber - Tuesday, December 8, 2009 - link

epobirs - Tuesday, December 8, 2009 - link

mamisano - Tuesday, December 8, 2009 - link

redpriest_ - Tuesday, December 8, 2009 - link

redpriest_ - Tuesday, December 8, 2009 - link

JohanAnandtech - Tuesday, December 8, 2009 - link

azmodean - Monday, December 7, 2009 - link

rs1 - Wednesday, December 9, 2009 - link

Zan Lynx - Monday, December 14, 2009 - link

Scali - Tuesday, December 8, 2009 - link

Cerb - Tuesday, December 8, 2009 - link

JohanAnandtech - Monday, December 7, 2009 - link

azmodean - Monday, December 7, 2009 - link

haplo602 - Tuesday, December 8, 2009 - link

AluminumStudios - Tuesday, December 8, 2009 - link

Entz - Monday, December 7, 2009 - link

SixOfSeven - Monday, December 7, 2009 - link

Scali - Monday, December 7, 2009 - link

alxx - Wednesday, December 9, 2009 - link

zonan4 - Thursday, December 10, 2009 - link

Scali - Thursday, December 10, 2009 - link

Penti - Sunday, December 13, 2009 - link

Scali - Monday, December 14, 2009 - link

The0ne - Monday, December 7, 2009 - link

Zoomer - Wednesday, December 9, 2009 - link

Scali - Monday, December 7, 2009 - link

Scali - Monday, December 7, 2009 - link

wetwareinterface - Wednesday, December 9, 2009 - link

Scali - Thursday, December 10, 2009 - link

Scali - Thursday, December 10, 2009 - link

ET - Monday, December 7, 2009 - link

davepermen - Monday, December 7, 2009 - link

Springfield45 - Sunday, December 6, 2009 - link

jensend - Sunday, December 6, 2009 - link

GourdFreeMan - Sunday, December 6, 2009 - link

dgingeri - Tuesday, December 8, 2009 - link

GourdFreeMan - Tuesday, December 8, 2009 - link

tygrus - Monday, December 7, 2009 - link

GourdFreeMan - Tuesday, December 8, 2009 - link

GourdFreeMan - Tuesday, December 8, 2009 - link

titan7 - Monday, December 7, 2009 - link

Zool - Wednesday, December 9, 2009 - link

jensend - Sunday, December 6, 2009 - link

Shining Arcanine - Sunday, December 6, 2009 - link

Targon - Saturday, January 9, 2010 - link

Exophase - Sunday, January 10, 2010 - link

Targon - Saturday, January 9, 2010 - link

WaltC - Friday, December 11, 2009 - link

Calin - Tuesday, December 8, 2009 - link

piroroadkill - Monday, December 7, 2009 - link

Shining Arcanine - Monday, December 7, 2009 - link

Griswold - Tuesday, December 15, 2009 - link

Scali - Tuesday, December 15, 2009 - link

piroroadkill - Tuesday, December 8, 2009 - link

johnsonx - Tuesday, December 8, 2009 - link

bersl2 - Tuesday, December 8, 2009 - link

piroroadkill - Tuesday, December 8, 2009 - link