• What are you working on? v66 - February 2017
    558 replies, posted
I didn't realize until last night that my job is the epitome of the perception and jokes around software developed for the government. I am literally "the intern" developing an entire application by myself from the ground up with no prior qualifications, and its probably gonna show in the final product no matter how hard I try. Thus continuing the joke/reality. At some point though, it stops becoming my fault and becomes the fault of upper management not assigning qualified programmers to either assist me or take over the job. So there's that, I guess :v: [editline]28th February 2017[/editline] [QUOTE=phygon;51886047]Seems to me like the switch would be a cause for concern more than the variables. If you do even a single tiny thing that the compiler can't determine the result of related to the switch it won't be able to flatten it correctly and your code will run like crap I managed to improve the speed of my cloud shader by a factor of 2 by removing a single if statement [I]designed to break out of the loop early to save time[/I][/QUOTE] Also, missed this reply, but yeah that would make sense. I'm still trying to learn best practices and such, and this seems like a pretty clear one in hindsight. And with the GPU being as fast as it is at the simple FMA stuff that makes up the various curving options in that switch statement, there's no reason to not just go with the higher quality option (and benchmarks/tests showed that it wasn't a big speed concern). [editline]editline[/editline] Only problem is that GPU debugging is still a tremendous pain in the dick. NSight's CUDA implementation/abilities aren't terrible, but certain elements of them feel very sparse. variable inspection is certainly one element. Inspecting disassembly and code for issues isn't best done through the CUDA debugger - rather, its easier to launch a thorough performance benchmark so you can grab in-depth data that helps ease reading through the disassembly. Problem is, when you need ot use the CUDA debugger, you probably can't use the performance profiler. I was having a problem recently where nvcuda.dll threw an access violation exception, after generating a surface object and trying to exit the scope of my module ctor, but only in release mode. I thought this might be an issue with asynchronous operations between the CPU and GPU, so I added some device synchronization calls after creating the surface object and allocating memory to make sure things were brought back together. No dice. I made sure to override the defines activating/de-activating my CUDA assert macro, still nothing. I toggled some random options in the compiler and BAM it starts working again. I've definitely bitch about that issue in the discord but it keeps happening and the fixes are invariably just random shit. Enabling GPU debugging and host-side debugging info while building in pseudo-release mode was of no aid, but random NVCC compiler things were I guess. Also, I was half-asleep when I fixed it, so I have no recollection of what I toggled to fix things :v
Found this gem: [url]https://cses.fi/book.pdf[/url] About competitive programming. Offers some nifty advice about C++ as well. Regarding taking out local vars in GPU programming, how in the shit does the compiler not do that?
[QUOTE=DoctorSalt;51887764]Found this gem: [url]https://cses.fi/book.pdf[/url] About competitive programming. Offers some nifty advice about C++ as well. Regarding taking out local vars in GPU programming, how in the shit does the compiler not do that?[/QUOTE] Not sure. The compiler is weird, but GPUs are weirder. Adding const to my variables passed to various device functions helped reduce memory usage in many places - presumably by avoiding copies, I guess? But when I tried to change to pass by reference, my usage of local memory would often [I]soar[/I]. Which doesn't make any sense. If anyone's interested in GPU memory and how it affects the programming practices used in GPGPU stuff, [URL="http://accel.cs.vt.edu/files/lecture11.pdf"]here's a great slideshow on GPU memory[/URL]. For me, my noise stuff really requires rather small block sizes of only 8x8 threads, to avoid too much fighting over register space among blocks in a grid. I dunno though, I'm hopelessly out of my depth here with this GPU stuff. Also out of my depth with most programming stuff. I miss not having some formal education on this stuff more and more everyday, there's so much I feel I don't understand yet and so much I still have to learn. If anyone's considering getting into CUDA though, I don't entirely recommend "CUDA for Engineers" - its the book my professor wrote, and while its not bad, it seems to be pretty out of date (no features that use anything newer than CUDA 3.0, for example). I haven't looked at other textbook options though, my best resource has just been the good ol interwebs. [editline]edit[/editline] well, that pdf you linked is pretty useful damn. I was expecting a handful of slides or a light journal-type thing, but this is really useful and is definitely something I'll be going through in detail.
Shadow mapping yeaboi [img]http://puu.sh/uoaJb.jpg[/img]
Decided I wanted to make something so I could practice, never made any real project before. [url=https://github.com/TeamEnternode/Command-Line-Calculator]Everyone has to start somewhere I guess[/url]. Not up to date but captures the general idea I want to accomplish. [sp]Working on the polynomial division before[/sp] Not worried about optimization yet.
Most exciting waywo update yet: Going into spectator cam with the trombone out then back into first person mode by pressing escape no longer means that the trombone uses the sword controls and can attack, and instead uses the correct trombone controls On top of that (oh my goodness I'm a madman) if you die while having the trombone out, it doesn't leave behind an extra trombone when you die, and the subsequent trombone you hold isn't super buggy (somebody stop me) It gets even more exciting. I changed the client -> master server interaction to be based on UDP so I didn't have to create TCP connections to get serverlist information. This means that I can now just fire off pings to the master server. I used this to create a simple server browser which is non functional, but you can refresh it and get the pings to the gameservers as well as getting an updated list of gameservers from the master server! Wow! Immense! Outrageous! That's it for today, I'm off to play blackwake for a million hours
[QUOTE=Icedshot;51893117][...] It gets even more exciting. I changed the client -> master server interaction to be based on UDP so I didn't have to create TCP connections to get serverlist information. This means that I can now just fire off pings to the master server. I used this to create a simple server browser which is non functional, but you can refresh it and get the pings to the gameservers as well as getting an updated list of gameservers from the master server! Wow! Immense! Outrageous! [...][/QUOTE] Purely for information, what's the reply size/request size ratio on that?
Today I photoscanned the base of a tree, baked it and put it in my renderer: [t]https://puu.sh/uppbc.jpg[/t] [t]https://puu.sh/uppbl.jpg[/t] [t]https://puu.sh/uppbt.jpg[/t]
Sorry, you need to Log In to post a reply to this thread.