• What are you working on? v66 - February 2017
    558 replies, posted
[QUOTE=alien_guy;51865322]so how did I manage all my cuda stuff having forgotten alignment was a thing until just now.[/QUOTE] Depends how you used it, I guess? The main issues I'm having are with the data alignment when reading from my lookup texture objects, which are created + sent to the GPU [URL="https://github.com/fuchstraumer/CUDA_Noise/blob/master/cpp/modules/generators/Perlin.cpp#L23"]here[/URL]. I'll try to look into just making sure to explicitly align all my lookup objects, or anything I send to the GPU. I never had problems using boring old C-style arrays writing class homework, but that stuff was written using my professors usage of the CUDA API, which is all <3.0. Since these texture LUTs are used by so many threads, I'd prefer to not have to stop using texture objects and texture memory for them. I vaguely recall there being a mention about more stringent alignment requirements for texture objects and other objects using texture memory, compared to those using const memory.
[url]https://cuda-programming.blogspot.co.uk/2013/02/cuda-array-in-cuda-how-to-use-cuda.html[/url] [cpp]int width = 3 , height = 3 ; float h_data[3][3] ; for ( int i =0 ; i<3 ; i++ ) for ( int j = 0 ; j<3; j++ ) h_data [i][j] = i*j ; int size = width*height*sizeof(float) ; // Allocate CUDA array in device memory cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat ); cudaArray* cuArray; cudaMallocArray(&cuArray, &channelDesc, width, height); // Copy to device memory some data located at address h_data // in host memory cudaMemcpyToArray(cuArray, 0, 0, h_data, size, cudaMemcpyHostToDevice); // Set texture parameters texRef.addressMode[0] = cudaAddressModeWrap; texRef.addressMode[1] = cudaAddressModeWrap; texRef.filterMode = cudaFilterModeLinear; texRef.normalized = true; // Bind the array to the texture reference cudaBindTextureToArray(texRef, cuArray, channelDesc); // Allocate result of transformation in device memory float* output; cudaMalloc(&output, size ); // Invoke kernel dim3 dimBlock(16, 16); dim3 dimGrid( (width + dimBlock.x - 1) / dimBlock.x, (height + dimBlock.y - 1) / dimBlock.y ); transformKernel<<<dimGrid, dimBlock>>>(output, width, height, 90 ); [/cpp] I think you're just missing a call to cudabindtexturetoarray. Textures are aligned/sized differently on the gpu (you might not even get the format you've asked for AFAIK) but the host api handles this transformation (except if you map it), at least in opencl
Doing procedural terrain generation because I have nothing better to do [vid]https://puu.sh/uhmKn/12286c3a23.webm[/vid] Gouraud shading: [vid]https://puu.sh/uhmAa/c24fc097ef.webm[/vid]
[QUOTE=Icedshot;51865448][url]https://cuda-programming.blogspot.co.uk/2013/02/cuda-array-in-cuda-how-to-use-cuda.html[/url] [cpp]int width = 3 , height = 3 ; float h_data[3][3] ; for ( int i =0 ; i<3 ; i++ ) for ( int j = 0 ; j<3; j++ ) h_data [i][j] = i*j ; int size = width*height*sizeof(float) ; // Allocate CUDA array in device memory cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat ); cudaArray* cuArray; cudaMallocArray(&cuArray, &channelDesc, width, height); // Copy to device memory some data located at address h_data // in host memory cudaMemcpyToArray(cuArray, 0, 0, h_data, size, cudaMemcpyHostToDevice); // Set texture parameters texRef.addressMode[0] = cudaAddressModeWrap; texRef.addressMode[1] = cudaAddressModeWrap; texRef.filterMode = cudaFilterModeLinear; texRef.normalized = true; // Bind the array to the texture reference cudaBindTextureToArray(texRef, cuArray, channelDesc); // Allocate result of transformation in device memory float* output; cudaMalloc(&output, size ); // Invoke kernel dim3 dimBlock(16, 16); dim3 dimGrid( (width + dimBlock.x - 1) / dimBlock.x, (height + dimBlock.y - 1) / dimBlock.y ); transformKernel<<<dimGrid, dimBlock>>>(output, width, height, 90 ); [/cpp] I think you're just missing a call to cudabindtexturetoarray. Textures are aligned/sized differently on the gpu (you might not even get the format you've asked for AFAIK) but the host api handles this transformation (except if you map it), at least in opencl[/QUOTE] Thanks for pointing that out! I had not seen cudaBindTextureToArray used in any of the official examples of texture generation, so I'm going to have to give that a shot. That would make sense, though. Things mostly work now, but this might help fix some of the bugs.
Protip for programming NPC/AI: Have two separate Vector3, #1 desiredWorldPos and #2 desiredLookAtPos. This way you will easily do more complex behavior or if you want simple NPC just do desiredLookAtPos = desiredWorldPos :).
Valve strikes again, this is 10/10 [url]https://steamcommunity.com/games/596420/announcements/detail/521693426582988261[/url]
made a tutorial on how to reverse engineer. [video=youtube;n92TNHHjY8A]https://www.youtube.com/watch?v=n92TNHHjY8A[/video]
Made a Windows 10 app, after a week of waiting they finally tested it: [img]https://dl.dropboxusercontent.com/u/675786/ShareX/2017-02/24_08-52-18.png[/img] Downsider might be dead but they still certified the app!
Had a lab at uni where we did some basic OpenGL stuff like matrix stacking, so of course I took that code and started to make a physically based model viewer out of it: [t]http://puu.sh/ui3vI.jpg[/t] [t]http://puu.sh/ui2FZ.jpg[/t] [t]http://puu.sh/ui13D.jpg[/t]
So I have been working on this demo the past two weeks. We're basically mimicking the built in demo made by Leap Motion and then extending this with more experimental features. That's me in the video! I don't know how to embed this though? Any of these work? [url]https://www.facebook.com/plugins/video.php?href=https://www.facebook.com/futurice/videos/10158140510910005[/url] [url]https://www.facebook.com/futurice/videos/10158140510910005/[/url]
[QUOTE=Icedshot;51868367]Valve strikes again, this is 10/10 [url]https://steamcommunity.com/games/596420/announcements/detail/521693426582988261[/url][/QUOTE] I'd implement this in my engine as I have no current audio implementation, however I can't find any licensing information. They say its free but what under what license specifically?
Implemented it today. Works really well! Used to use the OculusSpatializer but it ate CPU like nothing else, and this is hardly noticable.
So now that the portals are done I'm trying my hand at making a reasonably fast realtime volumetric cloud / general effect pack for Unity Volumetric rendering is fun [IMG]https://cdn.discordapp.com/attachments/234607445874442241/284615335737294848/unknown.png[/IMG] [IMG]http://i.imgur.com/ws5NTRy.gif[/IMG]
working on this mac port again that was supposed to be done like three years ago [thumb]http://i.imgur.com/6tVsL4u.png[/thumb]
update on CUDA woes: Turns out I'm running out of heap memory on my device. It took more googling, as the memory exception details aren't really specified in the documentation, at least not where I looked (they probably are, I just can't find them). On a GTX 1070. uh. I should probably look into lowering memory usage. This also explains why the errors only appear at larger output sizes, but doesn't explain why the errors are thrown in debug mode (but still output an image correctly) when the errors are thrown with no image output in release mode. Odd. I already have tried to increase the heap size to the maximum CUDA will give me, but somehow my code is hungry enough to eat all 128mb available for me. [editline]edit[/editline] If there's one thing Stefan Gustavson's simplex code does thats a bit silly, its create a lot of temporary variables that are only used once. Like the set of three simplex cell numbers "t0, t1, t2" also have companions "t20, t40, t21, t41, t22, t42", which is just t0,t1,t2 raised to the second and fourth powers. These are only used once, so why have the device allocate for six extra floats per thread? I also might need to start doing device-side malloc and free calls, since the fractal looping nature may cause issues. Memory is only free'd at the end of a kernel, and I think I can do better than that.
[QUOTE=Karmah;51869758]I'd implement this in my engine as I have no current audio implementation, however I can't find any licensing information. They say its free but what under what license specifically?[/QUOTE] license is [url=https://github.com/ValveSoftware/steam-audio/blob/master/LICENSE.md]here[/url]
Kudos to whoever can tell me what this does, and double kudos to whoever can tell me why it doesn't work. Language is C. [IMG]http://i.imgur.com/C3OE6xS.png[/IMG]
[QUOTE=Sidneys1;51870732]Kudos to whoever can tell me what this does, and double kudos to whoever can tell me why it doesn't work. Language is C. [IMG]http://i.imgur.com/C3OE6xS.png[/IMG][/QUOTE] I don't even what this is trying to accomplish.
[QUOTE=Sidneys1;51870732]Kudos to whoever can tell me what this does, and double kudos to whoever can tell me why it doesn't work. Language is C. [IMG]http://i.imgur.com/C3OE6xS.png[/IMG][/QUOTE] it does foo it doesn't work because its fubar
[QUOTE=Sidneys1;51870732]Kudos to whoever can tell me what this does, and double kudos to whoever can tell me why it doesn't work. Language is C. [IMG]http://i.imgur.com/C3OE6xS.png[/IMG][/QUOTE] It may be trying to store the length of the string **bar in *bar, but... the problem is, if **bar isn't 0, it's an infinite loop.
[QUOTE=DarKSunrise;51870648]license is [url=https://github.com/ValveSoftware/steam-audio/blob/master/LICENSE.md]here[/url][/QUOTE] Why do they use GitHub tho. It seems like a lot of big companies currently tend to abuse GitHub by just making a repo to host their docs + releases, which basically just uses bandwith and storage without contributing anything to the open-source community... :/
[QUOTE=Ac!dL3ak;51871000]It may be trying to store the length of the string **bar in *bar, but... the problem is, if **bar isn't 0, it's an infinite loop.[/QUOTE] thats what i figured it was doing however it will run into an index out of array error when it gets to the end. theres probably a better way to accomplish whatever it is its trying to.
[QUOTE=johnnyaka;51871045]Why do they use GitHub tho. It seems like a lot of big companies currently tend to abuse GitHub by just making a repo to host their docs + releases, which basically just uses bandwith and storage without contributing anything to the open-source community... :/[/QUOTE] If they cared it'd be against their TOS.
Got tired of trying to make Texture-LUT based CUDA code work, so rewrote all my base CUDA noise generation code to be based on pre-existing code for things like GLSL/Cg. It worked! Perlin "Billow": [t]http://i.imgur.com/SFlUxuM.png[/t] Simplex "Billow": [t]http://i.imgur.com/eUAMZZF.png[/t] The simplex stuff is significantly slower, but I think it looks better as you get fewer straight lines and it doesn't align to a grid quite like Perlin does. The grid effect is subtle, but visible. There's still a bit of artifacting from the simplex (triangular) grid in the simplex output, but its more subtle. The two still look pretty damn similar though, which is actually pretty good. I think the Simplex noise should give higher potential "detail", so to speak, but other than that they're not very different. I was able to generate a 16384x16384 texture in 250ms, though. It did require 6gb of RAM to get the image back on the CPU, however D:
[QUOTE=johnnyaka;51871045]Why do they use GitHub tho. It seems like a lot of big companies currently tend to abuse GitHub by just making a repo to host their docs + releases, which basically just uses bandwith and storage without contributing anything to the open-source community... :/[/QUOTE] At the very least maybe it will promote Git Coming from a Git background and recently starting a job where TFS is used is like suddenly having your arms tied behind your back and being forced to operate your version controlling by mushing your head against the keyboard I miss Git :(
[QUOTE=Ac!dL3ak;51871000]It may be trying to store the length of the string **bar in *bar, but... the problem is, if **bar isn't 0, it's an infinite loop.[/QUOTE] Whoops, you're right, that should be: [code] void foo(char*** bar) { for (char* baz = **bar, (*bar) = 0; *baz; baz++, (*bar)++); } [/code] Good catch. And you're correct: [code] void strlen(char*** strptrptr) { for (char* str = **strptrptr, (*strptrptr) = 0; *str; str++, (*strptrptr)++); } void main() { char* str = "test"; char** strptr = &str; strlen(&strptr); long len = (long)strptr / 8; printf("The length of the string is %ld\n", len); } [/code] Now figure out why it doesn't work. ;)
I got the selector module working, who's function is best explained with this little diagram I created as part of my final project report: [t]http://i.imgur.com/D7kJnOV.jpg[/t] I tested a similar setup using libnoise, and the results weren't that surprising but goddamn. There's just no match for the GPU when it comes to stuff like this. The CPU is 240 times slower than the GPU, and I'm still only hooking basic modules together - plus, 95% of that time is spent generating the initial noise values on the GPU, whereas its split more evenly on the CPU. Starting to hate CUDA just a [I]little[/I] bit less rn
[QUOTE=Trumple;51871526]At the very least maybe it will promote Git Coming from a Git background and recently starting a job where TFS is used is like suddenly having your arms tied behind your back and being forced to operate your version controlling by mushing your head against the keyboard I miss Git :([/QUOTE] Maybe they'll let you use [URL="https://github.com/git-tfs/git-tfs"]a proxy[/URL] if you ask.
[QUOTE=paindoc;51871523] The simplex stuff is significantly slower [/QUOTE] that doesn't sound right
[QUOTE=phygon;51869820] [IMG]http://i.imgur.com/ws5NTRy.gif[/IMG][/QUOTE] "Time, Dr.Freemann?" That looks truly beautiful.
Sorry, you need to Log In to post a reply to this thread.