I am in the midst of a large project, and I am running into some weird issues. Heres the deal. I am messing about with making an operating system. I am by no means claiming to be good at it, or saying it will be 1/100th of what real operating systems can do. Disclaimer now being said, here we go.
For this project, I decided I would also write my own libc and libk to be used. I thought it would offer me a steep yet rewarding experience to get myself more familiar with c programming in general. So I decided I would write a printf( const char *, ... ) function to print data. And so the headaches began. Here is the code I am testing with:
int printf( const char * format, ...){
int written = 0;
va_list parameters;
va_start(parameters, format);
while (*format != '\0'){
switch (*format){
case '%' :{
format++;
switch (*format){
case 's' :{
char * str = va_arg( parameters, char *);
terminal_writestring( str );
break;
}
case 'c' :{
//char a = (char)va_arg( parameters, int );
//terminal_putchar( a );
break;
}
}
format++;
break;
}
case '\n' :{
terminal_newline();
format++;
break;
}
default :{
terminal_putchar( *format );
format++;
written++;
break;
}
}
}
va_end( parameters );
return written;
}
When tested with
printf( "%s\n", "Test" )
It outputs nonsense with "S ". Only enough when I exit the function from the "%s" test case, it prints fine. If I only do 1 loop in the while function, it works fine. So my thinking is that it is one of 2 problems:
a) The stack is messed up somehow
b) The compiler is outputting incorrect code.
Here is an objdump from printf.o: (I also included a few comments to aid reading where what is happening, for my own benefit at first)
00000000 <printf>:
#Allocated 0x18 bytes for variables
0: 55 push ebp
1: 89 e5 mov ebp,esp
3: 83 ec 18 sub esp,0x18
#stack:
#
# 0x10
# 0x0c
# 0x08 (WORD) char * format
# 0x04
# -------------- ebp -------------
# -0x08
# -0x10
# -0x14
# -0x18 (DWORD) 0
#
6: c7 45 f4 00 00 00 00 mov DWORD PTR [ebp-0xc],0x0
d: 8d 45 0c lea eax,[ebp+0xc]
10: 89 45 ec mov DWORD PTR [ebp-0x14],eax
13: eb 78 jmp 8d <printf+0x8d>
# While loop
15: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
18: 0f b6 00 movzx eax,BYTE PTR [eax]
1b: 0f be c0 movsx eax,al
# Test case '\n'
1e: 83 f8 0a cmp eax,0xa
21: 74 41 je 64 <printf+0x64>
# Test case '%'
23: 83 f8 25 cmp eax,0x25
26: 75 47 jne 6f <printf+0x6f>
# Case '%' Routine:
# increment format
28: 83 45 08 01 add DWORD PTR [ebp+0x8],0x1
2c: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
2f: 0f b6 00 movzx eax,BYTE PTR [eax]
32: 0f be c0 movsx eax,al
# Test case 'c'
35: 83 f8 63 cmp eax,0x63
38: 74 23 je 5d <printf+0x5d>
# Test case 's'
3a: 83 f8 73 cmp eax,0x73
3d: 75 1f jne 5e <printf+0x5e>
# Case 's' Routine
3f: 8b 45 ec mov eax,DWORD PTR [ebp-0x14]
42: 8d 50 04 lea edx,[eax+0x4]
45: 89 55 ec mov DWORD PTR [ebp-0x14],edx
48: 8b 00 mov eax,DWORD PTR [eax]
4a: 89 45 f0 mov DWORD PTR [ebp-0x10],eax
4d: 83 ec 0c sub esp,0xc
50: ff 75 f0 push DWORD PTR [ebp-0x10] ## ds:[ebp -0x10]
53: e8 fc ff ff ff call 54 <printf+0x54>
58: 83 c4 10 add esp,0x10
5b: eb 01 jmp 5e <printf+0x5e>
# Case 'c' Routine
5d: 90 nop
# Increment format
5e: 83 45 08 01 add DWORD PTR [ebp+0x8],0x1
62: eb 29 jmp 8d <printf+0x8d>
#New line
64: e8 fc ff ff ff call 65 <printf+0x65>
69: 83 45 08 01 add DWORD PTR [ebp+0x8],0x1
6d: eb 1e jmp 8d <printf+0x8d>
6f: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
72: 0f b6 00 movzx eax,BYTE PTR [eax]
75: 0f be c0 movsx eax,al
78: 83 ec 0c sub esp,0xc
7b: 50 push eax
7c: e8 fc ff ff ff call 7d <printf+0x7d>
81: 83 c4 10 add esp,0x10
84: 83 45 08 01 add DWORD PTR [ebp+0x8],0x1
88: 83 45 f4 01 add DWORD PTR [ebp-0xc],0x1
8c: 90 nop
#exit-test routine
8d: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
90: 0f b6 00 movzx eax,BYTE PTR [eax]
93: 84 c0 test al,al
95: 0f 85 7a ff ff ff jne 15 <printf+0x15>
9b: 8b 45 f4 mov eax,DWORD PTR [ebp-0xc]
9e: c9 leave
9f: c3 ret
I hadn't finished the stack representation at the top, but this is as far as I got before saying I need to ask some help to solve this problem. I've been pulling my hair out over this since other peoples code that worked for them doesn't work here. So pointing toward the compiler.
My compiler is an i386-elf cross compiler.
stack segment is different to data segment. Other functional calls work fine.
Bloody fed up of this one problem taking up most of my time thinking about what it's wrong. Any help is appreciated, rewarded with coins and other sexual favours. Thanks
It's been a while since I've done anything in C but to my knowledge you're passing a pointer with * and as such whatever is in your passed variable is what would normally be put there. But instead you're seeing what's currently taking up that spot in memory, hence jibberish.
I guess the question is why are you using a pointer instead of passing by reference?
Would that be for the "char * format" or the va_args? To my knowledge, when dealing with a string of characters in C, it is of type "char *". I am not sure if it's anything to do with that since by uncommenting other parts of the code, it works. For example:
int printf( const char * format, ...){
int written = 0;
va_list parameters;
va_start(parameters, format);
while (*format != '\0'){
switch (*format){
case '%' :{
format++;
switch (*format){
case 's' :{
char * str = va_arg( parameters, char *);
terminal_writestring( str );
goto hell;
break;
}
case 'c' :{
//char a = (char)va_arg( parameters, int );
//terminal_putchar( a );
break;
}
}
format++;
break;
}
case '\n' :{
terminal_newline();
format++;
break;
}
default :{
terminal_putchar( *format );
format++;
written++;
break;
}
}
}
hell:
va_end( parameters );
return written;
}
Works, However:
int printf( const char * format, ...){
int written = 0;
va_list parameters;
va_start(parameters, format);
while (*format != '\0'){
switch (*format){
case '%' :{
format++;
switch (*format){
case 's' :{
char * str = va_arg( parameters, char *);
terminal_writestring( str );
goto hell;
break;
}
case 'c' :{
char a = (char)va_arg( parameters, int );
terminal_putchar( a );
break;
}
}
format++;
break;
}
case '\n' :{
terminal_newline();
format++;
break;
}
default :{
terminal_putchar( *format );
format++;
written++;
break;
}
}
}
hell:
va_end( parameters );
return written;
}
doesn't. These are both executed with "printf( "%s\n", "test")". Of course, it doesn't print the newline because I exit the while loop with "goto hell;".
So I am pretty sure it is something the compiler is doing. I have turned off -02, however I need to move some code about in my bootstraper to load more than just 2kb since turning off -02 causes the code to be twice as large. So I am yet to see if -0n causes the problem, however it should none the less work regardless.
Long story short, and correct me if I am wrong, I am confident that my argument types are correct, since it "CAN" work, if I do the "goto hell;" test
It shouldn't be entering the case 'c' at all, right?
Are you able to step through it and confirm that it is entering the 'c' case? It sounds like for some reason it might be for some reason jumping into that case if all you're changing is commenting out the behavior for 'c'.
It might also be helpful to make sure it's only entering that switch statement ONCE.
Can confirm that in this example, %c is never entered. the % switch is only entered once also. I am going to try using a different gcc version (currently 9.0.0 which is marked as experimental).
I have yet to try this without optimization flags on compiler, due to the fact that my bootloader restricts me to 2kb, and 4+kb is rqeuired for no optimization. I think it's definitely some compiler trickery happening.
Sorry for the spam today guys, but I have some answers to my problem.
Now this is one of those cases where me "trying" to be smart has come back to bite me on the arse. So here is what was happening:
If we look closer at the Case 's' sub routine:
3f: 8b 45 ec mov eax,DWORD PTR [ebp-0x14]
42: 8d 50 04 lea edx,[eax+0x4]
45: 89 55 ec mov DWORD PTR [ebp-0x14],edx
48: 8b 00 mov eax,DWORD PTR [eax]
4a: 89 45 f0 mov DWORD PTR [ebp-0x10],eax
Now, ebp-0x14 holds the char * format variable. This is a pointer, so really lets just think about this as a number. We move this into eax, and (well this next part isn't 1-1 accurate but just to paint the picture) from there we load edx with the number in eax. We store this number back to where char *format was held, so we have essentially advanced the format pointer by 2 character. This makes sense since it's the optimization of the pointer being advanced twice in 2 locations (let's just do it once in one place, and keep track).
Current sit-rep: eax holds the pointer of format, edx holds the pointer of format+2
Next, we get the value at eax, and store it in eax.
Now, a little background:
When doing [eax + si + 4] or WHATEVER that isn't ebp or esp in the place of eax, the cpu unsterstands this as ds:[eax + si + 4]. Whereas when ebp is used, it uses the stack segment ss:[ebp + si + 4]. This is where the problem comes from.
I was trying to be smart by setting up my ss register to hold a segment different to that of ds. This makes my memory layout "not flat". I was doing this so I could segregate my stack from being read or written to from normal ds style memory access (don't ask me why, because after all of this I have no idea). Now to cut a long explanation short, here is an example of where this let me down:
lea eax, DWORD PTR [ ebp ] ;Load ebp into eax
mov [eax], "something" ;Write to eax (aka ebp)
In the case where ds != ss, then "something" would not be written to ebp as you would expect from a flat memory model, but rather to ds:[eax]. This is somewhere else then the stack in my case.
Coins go to flak since he was closest by simply saying "What you're trying to access isn't there". Leaving this post for anyone else who is ill equiped for os-theory.
Sorry, you need to Log In to post a reply to this thread.