Where can I try the the new SIMD feature of WebAssembly? - webassembly

Where can I try the the new SIMD feature of WebAssembly? Any other online instant WebAssembly compiling platform like Emscripten?

Related

Did webassembly support openmp?

As title , anyone know webassembly support openmp or not?
If support, how to use it?
Thanks.
No, WebAssembly does not support OpenMP - WebAssembly is an assembly language for the web. If you have the sourcecode for OpenMP, and it is in C or C++, you might be able to compile it to WebAssembly using Emscripten.

Intrinsics - cant find <ia64intrin.h> but have <ia32intrin.h>?

Whilst looking at the Intel Intrinsics pdf (to try and work out which headers need to be included) I can see that there is <ia64intrin.h> header. However, I only seem to have <ia32intrin.h> available.
What do I need to do to setup the ability to use all Intel intrinsics features? I have the Intel C/C++ compiler....
Do try to focus on the intrinsics that are supported by the specific processor you want to target. That is pretty unlikely to be an Itanium these days. So unusual that it isn't included anymore, the Itanium compiler is a separate download.

How do I use ARM NEON intrinsics?

Basically I'm developing for an iPhone and I compile fine on the Mac, however I want to use NEON intrinsics to accelerate my vector math. I have experience with SSE and AVX, however I have no idea where to get the NEON header with the intrinsics from. I found only one on the net and it only worked for GCC, all the functions had some __builtin keywords behind them. I'm compiling on the xcode llvm 5.0 compiler. I know I can use ARM assembly, however I'd like to use the intrinsic functions instead, since they make it easier. I've seen some in DirectXMath, encapsulated in vector objects, however they also had an #include , however there was no arm_neon.h anywhere.

C/C++ usage of special CPU features

I am curious, do new compilers use some extra features built into new CPUs such as MMX SSE,3DNow! and so?
I mean, in original 8086 there was even no FPU, so compiler that old cannot even use it, but new compilers can, since FPU is part of every new CPU. So, does new compilers use new features of CPU?
Or, it should be more right to ask, does new C/C++ standart library functions use new features?
Thanks for answer.
EDIT:
OK, so, if I get all of you right,even some standart operations, especially with float numbers can be done using SSE faster.
In order to use it, I must enable this feature in my compiler, if it supports it. If it does, I must be sure that targeted platform supports that features.
In case of some system libraries that require top performance, such as OpenGL, DirectX and so, this support may be supported in system.
By default, for compatibility reasons, compiler doesen´t support it, but you can add this support using special C functions delivered by, for example Intel. This should be the best way, since you can directly control wheather and when you use special features of desired platform, to write multi-CPU-support applications.
gcc will support newer instructions via command line arguments. See here for more info. To quote:
GCC can take advantage of the
additional instructions in the MMX,
SSE, SSE2, SSE3 and 3dnow extensions
of recent Intel and AMD processors.
The options -mmmx, -msse, -msse2,
-msse3 and -m3dnow enable the use of these extra instructions, allowing
multiple words of data to be processed
in parallel. The resulting executables
will only run on processors supporting
the appropriate extensions--on other
systems they will crash with an
Illegal instruction error (or similar)
These instructions are not part of any ISO C/C++ standards. They are available through compiler intrinsics, depending on the compiler used.
For MSVC, see http://msdn.microsoft.com/en-us/library/26td21ds(VS.80).aspx
For GCC, you could look at http://developer.apple.com/hardwaredrivers/ve/sse.html
AFAIK, SSE intrinsics are the same between GCC and MSVC.
Compilers will aim for producing code for a minimal set of features in a processor. They also provide compilation switches that allow you to target specific processors. In this manner, they can sell more compilers (to those folks with old processors as well as the trendy folk with new ones).
You will need to study the documentation that came with your compiler.
Sometimes the runtime library will contain multiple implementations of a feature, and the library will dynamically choose between implementations when the program is run. The overhead might be the cost of a function pointer call instead of a direct function call, but the benefit could be much greater when using a CPU-specific optimised function.
JIT compilers (for VM languages such as Java and C#) take this one step further and compile the bytecode for the specific CPU that it's running on. This gives your own code the benefit of specific CPU optimisation. This is one reason why Java code can actually be faster than compiled C code, because the Java JIT compiler can delay its optimisation decisions until the program is run on the actual target machine. A C compiler must make those decisions without always knowing what the target CPU is. Furthermore, JIT compilers evolve and can make your program faster over time without you having to do anything.
If you use the Intel C compiler, and set sufficiently high optimisation options, you will find that some of your loops get 'vectorised', which means the compiler has rewritten them to use SSE-style instructions.
If you want to use SSE operations directly, you use the intrinsics defined in the 'xmmintrin.h' header file; say
#include <xmmintrin.h>
__m128 U, V, W;
float ww[4];
V=_mm_set1_ps(1.5);
U=_mm_set_ps(0,1,2,3);
W=_mm_add_ps(U,V);
_mm_storeu_ps(ww,W);
Varying compilers will use varying new features. Visual Studio will use SSE/2, and I believe the Intel compiler will support the very latest in CPU features. You should, of course, be wary about the market penetration of your favourite feature.
As for what your favourite standard library use, that depends on what it was compiled with. However, C++ standard library is typically compiled on-site, since it's very heavily templated, so if you enable SSE2, the C++ std libs should use it. As for the CRT, depends on what they were compiled with.
There are generally two ways a compiler can generate code that uses special features like these:
When the compiler itself is compiled, you configure it to generate code for a particular architecture, and it can take advantage of any features it knows that architecture will have. For example, if it gcc is configured for an Intel processor new enough (or is that "not old enough"?) to contain an integrated FPU, it will generate floating-point instructions.
When the compiler is invoked, flags or parameters can specify the type of features available to the processor that will run the program, and then the compiler will know it is safe to use these features. If the flags aren't present, it will generate equivalent code without using the special instructions provided by those features.
If you're talking about code written in C/C++, the new features are explited if you tell to your compiler to do so. By default, your compiler probably targets "plain x86" (naturally with FPU :) ), usually optimized for the most widespread processor generation at the moment, but still able to run on older processors.
If you want the compiler to generate code also considering the new instruction sets, you should tell it to do so with the appropriate command line switch/project setting, for example for Visual C++ the option to enable SSE/SSE2 instructions generation is /arch.
Notice that many features of new instruction sets cannot be exploited directly in "normal" code, so you are usually provided with compiler intrinsics to operate on the particular datatypes native of the new instruction sets.
Intel provides updated CPUID example code every time they release a new cpu so that you can check for the new features and has been as long as I remember. At least this is what I found the first time I thought about this same question myself.
Using CPUID to Detect the presence of SSE 4.1 and SSE 4.2 Instruction Sets
As new compilers are released they add the new features directly like VS2010 for example.
Visual C++ Code Generation in Visual Studio 2010

Executing CPU/GPU instructions from managed code

Taking into account the execute disable bit what is the recommended way of executing instructions against a native processor from a high level managed environment such as VB.NET 2008 or C#. In addition has anyone achieved similar in executing GPU instructions against a graphics processor?
There are a few GPU options for C#, for direct access without reverting to P/Invoke or writing your own C++ wrappers:
Brahma is quite interesting. It provides access to the GPU directly via a customized LINQ provider. The code includes some samples of highly computational methods run on the GPU, all written in C# via LINQ.
SlimDX provides a nice .NET wrapper to all of the major DirectX functionality. With custom shaders, you can do computation on the GPU via DirectX. It also includes DX11 support, so you can use the compute shaders directly (if you have the hardware for it).
You can access CUDA via CUDA.NET.
You can use OpenCL via OpenCL.NET.
As for CPU instructions, this typically would require dropping to lower level native code with assembler instruction. Probably the most interesting completely managed (at least partially related) option would be to use Mono.Simd, which provides direct access to the SIMD instructions in the CPU from managed code when running on the Mono stack.
It is not an option. You'll have to P/Invoke a function in DLL that was generated by MASM or written in unmanaged C/C++, using inline assembly or intrinsics. Or use the C++/CLI compiler and generate mixed mode code with #pragma managed.
Beware that you now can no longer depend on the JIT compiler generating whatever platform code is suitable for the operating system. Use Project + Properties, Build tab, Platform Target to force the architecture to match your unmanaged code.
Look at CUDA for managed GPU code.
Create unmanaged .dll library with whatever instructions you want and use P/Invoke to call it.

Resources