View Issue Details

IDProjectCategoryView StatusLast Update
0000283madVRbugpublic2015-04-14 14:18
Reportercyberbeing Assigned Tomadshi  
PriorityhighSeveritytweakReproducibilityalways
Status closedResolutionno change required 
Platformx64OSWindows 7 SP1OS Version7601
Summary0000283: OpenCL registry key is not regenerated when switching between madVR x86 and madVR x64
DescriptionCurrently madVR 0.87.21 has an issue that it fails to regenerate the OpenCL kernel located at HKEY_CURRENT_USER\Software\madshi\madVR\OpenCL when switching between madVR x86 & x64.

On NVIDIA, this causes the NVVM compiler to generate a kernel using .address_size 64 if madVR x64 was run first, yet .address_size 32 if madVR x86 was run first.

Rather than regeneration the kernel each time, madVR x86 should always make NVVM generate the OpenCL kernel with .address_size 64, if the user is running a 64bit OS.

There is an open question if there is any performance impact running an .address_size 32 or .address_size 64 with madVR x64, yet both do function.

Currently it is noteworthy that the r349 drivers have a NNEDI3 corruption bug (mentioned in http://bugs.madshi.net/view.php?id=250 ) with Kepler cards which only occurs with the .address_size 32 kernel.
TagsNo tags attached.
madVR Versionv0.87.21
Media Player (with version info)MPC-HC 1.7.8.156
Splitter (with version info)LAV 0.64.x
Decoder (with version info)LAV 0.64.x
DecodingSoftware
Deinterlacingnone (progressive)
DXVA2 Scaling Activeno
Aero / Desktop CompositionOn
Problem occurs with modeall modes
GPU ManufacturerNVidia
GPU ModelGTX 770
GPU Driver Version350.12

Activities

cyberbeing

2015-04-14 03:06

reporter   ~0000946

>There is an open question if there is any performance impact running an
>.address_size 32 or .address_size 64 with madVR x64, yet both do function.

That should have said:
There is an open question if there is any performance impact running an
.address_size 32 or .address_size 64 with _madVR x86_, yet both do function.

madshi

2015-04-14 03:11

administrator   ~0000948

I can't technically "choose" the nvidia NVVM compiler bitdepth. The x86 version of madVR must always use the x86 version of NVVM. There's no way around that. Also I think the .address_size is not something I have control over.

What I could do is regenerate the OpenCL kernel every time the user switches between madVR x86 and madVR x64. But really, I don't think that's a good idea. Why? Because there should really be no difference. If there's a difference, that would count as an NVVM bug in my account. The GPU doesn't care which bitdepth the CPU process runs in. Regenerating the OpenCL kernel takes time, quite a lot of time, actually. There's a good reason why I'm caching it in the registry.

There's no technical reason why madVR should regenerate the OpenCL kernel when switching madVR bitdepths. The only reason would be to work around an NVidia driver bug. But that's not a good enough reason in my book. Actually, the current solution allows you to manually select which kernel to use, by manually deleting the kernel and then starting the madVR bitdepth you want the kernel to be compiled with. If I automatically regenerate the kernel every time you switch bitdepths, you could not make the "good" kernel stick for 32bit.

cyberbeing

2015-04-14 03:17

reporter   ~0000950

Then how about creating a separate registry value for the x64 kernel, called BinaryX64 or similar?

madshi

2015-04-14 03:28

administrator   ~0000951

What would that help? 32bit users would still have the same bug.

There is no logical or technical reason for using different caches for OpenCL kernels based on CPU compiler bitdepth. The CPU compiler bitdepth should not matter at all. That it currently does matter is just a temporary NVidia driver bug, which will hopefully be fixed soon.

This is like 32bit MS Paint creating a different BMP file than 64bit MS Paint. Makes no sense. In the same way a 32bit OpenCL compiler should not create a different kernel than a 64bit OpenCL compiler.

cyberbeing

2015-04-14 03:32

reporter   ~0000952

Last edited: 2015-04-14 03:36

Rather than be concerned about the current r349 NVIDIA Kepler corruption bug with OpenCL .address_size 32 (which I assume will eventually be fixed), the bigger issue here seems to be that NVVM generates OpenCL kernels using 64bit math if madVR x64 is run first, yet 32bit math if madVR x86 is run first.

One way or another, I think you should figure out a solution so that the .address_size 64 kernel is used when running madVR x64. You shouldn't expect end-users to start deleting registry keys if they happened to run madVR x86 first after a driver upgrade, at least IMHO.

I don't think NVIDIA would consider it a bug that they are not using 64bit optimizations when told to generate a kernel for x86. As far as I can tell, NVVM has been doing this for years now, so it's probably by design.

cyberbeing

2015-04-14 03:46

reporter   ~0000953

> Regenerating the OpenCL kernel takes time, quite a lot of time, actually.
> There's a good reason why I'm caching it in the registry.

Have you confirmed this is still an issue on NVIDIA? For awhile now, they have been caching Compute kernels on the hard drive (what madVR stores in Binary key + more). I wouldn't be surprised if NVIDIA didn't regenerate the kernel at all, if it was sitting in the cache.

madshi

2015-04-14 10:00

administrator   ~0000956

None of my code needs/wants 64bit math. 32bit is just fine. GPUs are terribly slow when doing double precision. OpenCL knows different type names for different data bitdepths. E.g. "int", "uint" and "float" are strictly defined as 32bit integer/float, while "long", "ulong" and "double" are strictly defined as 64bit integer/float. My OpenCL code uses "int", "uint" and "float" everywhere, indicating that I want 32bit integer/float, not 64bit. There's no leeway here for NVidia to interpret my OpenCL kernel source code in any way. The OpenCL spec is very strict about these things, and NVVM x86 is not allowed to interpret my code any differently compared to NVVM x64.

There's also no API available to tell the compiler what to do. There are no flags like "use 32bit" or "use 64bit". The type specifiers in the kernel source code already define all that.

I'll say it again: If NVVM x86 creates different kernels than NVVM x64, then that is a bug in NVVM, nothing else. There will never ever be a situation where there's a justified technical reason for NVVM x86 to create a different kernel than NVVM x64, with the OpenCL kernel I'm compiling. So there's also never ever a justified reason for me to store x86 and x64 compiled kernels differently. It just doesn't make any sense, nor does it bring any benefit to the user.

If NVVM x86 and NVVM x64 create different kernels then you can practically roll the dice which one is better. It will be pure luck/random, nothing else. So why should I store kernels separately? I see no practical benefit.

The OpenCL source code is also in "madVR\legal stuff". I have to put it there due to LGPL.

Please tell me:

1) What is the practical benefit for storing x86 and x64 compiled kernels differently? What benefit does that bring to the end user?
2) Do you really believe I should custom adjust the whole madVR logic to mirror the behaviour of an incorrect and buggy NVM behaviour?

cyberbeing

2015-04-14 11:41

reporter   ~0000958

Last edited: 2015-04-14 11:43

> here's no leeway here for NVidia to interpret my OpenCL kernel source code in any way.
> The OpenCL spec is very strict about these things, and NVVM x86 is not allowed to interpret
> my code any differently compared to NVVM x64.

> If NVVM x86 creates different kernels than NVVM x64, then that is a bug in NVVM, nothing else.

> If NVVM x86 and NVVM x64 create different kernels then you can practically roll the dice which one is better.

The kernel actually appears to be identical other then 64bit memory addressing. After-all PTX is a pseudo assembly language, which is much lower level than languages like OpenCL and CUDA.

In OpenCL terms, address size seems to be referring to CL_DEVICE_ADDRESS_BITS.


>There's also no API available to tell the compiler what to do.

It does sound like such options are only available in the standalone compilers from NVIDIA and AMD. A quick search seems to suggest that on AMD's compiler the option is called GPU_FORCE_64BIT_PTR, but unlike NVIDIA they don't use this option by default with 64bit OS/software in their driver.

Maybe the answer is actually the opposite, and if 64bit address are of no benefit to NNEDI3, to instead always have madVR generate a 32bit address kernel?


> What is the practical benefit for storing
> x86 and x64 compiled kernels differently?
> What benefit does that bring to the end user?

No idea. I'd naively assume it has some kind of benefit related to 64 bit memory access? Yet maybe it would actually hurt performance in some other area? Here is what some of NVIDIA's docs say about addresses:

"Address arithmetic is performed using integer arithmetic and logical instructions.
Examples include pointer arithmetic and pointer comparisons. All addresses and
address computations are byte-based; there is no support for C-style pointer arithmetic.
The mov instruction can be used to move the address of a variable into a pointer. The
address is an offset in the state space in which the variable is declared. Load and store
operations move data between registers and locations in addressable state spaces. The
syntax is similar to that used in many assembly languages, where scalar variables are
simply named and addresses are de-referenced by enclosing the address expression in
square brackets. Address expressions include variable names, address registers, address
register plus byte offset, and immediate address expressions which evaluate at compiletime
to a constant address."

"The address must be naturally aligned to a multiple of the access size. If an address is not properly aligned, the resulting behavior is undefined; i.e., the access may proceed by silently masking off low-order address bits to achieve proper rounding, or the instruction may fault.

The address size may be either 32-bit or 64-bit. Addresses are zero-extended to the specified width as needed, and truncated if the register width exceeds the state space address width for the target architecture."

__________________


So in conclusion, if you don't think there is any advantages or disadvantages to using a 32bit or 64bit address kernel with both madVR x86 and x64 then you probably don't need to do anything. My previous assumption that the .address_size option causes the compiler replace 32bit math with 64bit math your OpenCL kernel seems like it was blatantly incorrect. Instead being related to use of 32bit or 64bit memory pointers or similar.

madshi

2015-04-14 14:18

administrator   ~0000961

To be honest, I'm not sure exactly how memory management is done exactly in OpenCL. I think there are some modes (at least in OpenCL 2.0) where GPU and CPU can share memory somehow. Maybe in those cases a compiled OpenCL kernel for 64bit could be different to 32bit. But I don't really know. As long as there's no clear indication that the NVVM bitdepth must match the madVR bitdepth to produce correct results I'm not in favor of changing anything at this point.

If you do find evidence that madVR needs to store OpenCL kernels per bitdepth (and that there are issues otherwise), please reopen this bug. For now I think the current solution is the best approach. So I'm going to close this bug for now.

Issue History

Date Modified Username Field Change
2015-04-14 03:03 cyberbeing New Issue
2015-04-14 03:06 cyberbeing Note Added: 0000946
2015-04-14 03:11 madshi Note Added: 0000948
2015-04-14 03:11 madshi Assigned To => madshi
2015-04-14 03:11 madshi Status new => feedback
2015-04-14 03:17 cyberbeing Note Added: 0000950
2015-04-14 03:17 cyberbeing Status feedback => assigned
2015-04-14 03:28 madshi Note Added: 0000951
2015-04-14 03:28 madshi Status assigned => feedback
2015-04-14 03:32 cyberbeing Note Added: 0000952
2015-04-14 03:32 cyberbeing Status feedback => assigned
2015-04-14 03:36 cyberbeing Note Edited: 0000952
2015-04-14 03:46 cyberbeing Note Added: 0000953
2015-04-14 10:00 madshi Note Added: 0000956
2015-04-14 10:00 madshi Status assigned => feedback
2015-04-14 11:41 cyberbeing Note Added: 0000958
2015-04-14 11:41 cyberbeing Status feedback => assigned
2015-04-14 11:42 cyberbeing Note Edited: 0000958
2015-04-14 11:43 cyberbeing Note Edited: 0000958
2015-04-14 14:18 madshi Note Added: 0000961
2015-04-14 14:18 madshi Status assigned => closed
2015-04-14 14:18 madshi Resolution open => no change required