Problem Set 5 : Forensics

Hello guys! Welcome back. This is CS50! We’re going to start a Problem Set 5. In this problem set, you’ll get more comfortable with data structures, hexadecimal, and pointers. Also, acquaint you with file I/O. As always, before we get started you will want to read these :

Recommended Reading :
http://www.cprogramming.com/tutorial/cfileio.html
http://en.wikipedia.org/wiki/BMP_file_format
http://en.wikipedia.org/wiki/Hexadecimal
http://en.wikipedia.org/wiki/Jpg
But, the Wikipedia articles are a bit dense, so feel free to skip it.

Recommended watching :
File I/O : https://www.youtube.com/watch?v=KwvObCA04dU
Structs : https://www.youtube.com/watch?v=EzRwP7NV0LM
Valgrind : https://www.youtube.com/watch?v=EzRwP7NV0LM
GDB : https://www.youtube.com/watch?v=USPvePv1uzE


Getting Started

Welcome back!
As always, you’ll need to read a long the specification. http://d2o9nyf4hwsci4.cloudfront.net/2014/x/psets/5/pset5/pset5.html
Open you appliance, then open up the terminal window and execute

update50

To make sure your appliance is up-to-date.
Like Problem Set 3 and 4. This problem set comes with the distribution code that you’ll need to download before getting started. Go ahead, and execute :

cd ~/Dropbox

In order to navigate your way to ~/Dropbox. Next, execute :

wget http://cdn.cs50.net/2013/fall/psets/5/pset5/pset5.zip

In order to download a ZIP (i.e., compressed version) of this problem set’s distro.If you then execute :

ls

you should see that you now also have a pset5 directory. You’re now welcome to delete the ZIP file with the below :

rm -f pset5.zip

Now dive into that pset5 directory by executing the below :

cd pset5

Now execute :

ls

And you should see that the directory contains the below :

bmp/ jpg/ questions.txt

How fun! There are two subdirectories and a file. Who know what’s inside ? Let’s get started.
Have you ever seen Windows XP’s default wallpaper (think the rolling hills and blue skies) ? Then, you’ve seen BMP. If you ever looked at webpage you’ve probably seen GIF. If you’ve ever looked at digital photo you’ve probably seen JPEG. And, if you’ve ever taken a screenshot of Mac, you’ve seen PNG. Read up a bit of GIF, JPEG, BMP and PNG file formats, then open up questions.txt on gedit. Be sure you can answer these questions :

0. How many different colors does each format support?
1. Which of these formats supports animation?
2. What’s the difference between lossy and lossless compression?
3. Which of these formats is lossy-compressed?

Read up at the MIT’s article : http://cdn.cs50.net/2013/fall/psets/5/garfinkel.pdf
And try to answer these questions on ~/Dropbox/pset5/questions.txt.

4. What happens, technically speaking, when a file is deleted on a FAT file system?
5. What can someone like you do to ensure (with high probability) that files you delete cannot be recovered?


Anyway, you have to do 3 parts in this problem set and they are :

  • whodunit.c
  • resize.c
  • recover.c

WHODUNIT and RESIZE

http://d2o9nyf4hwsci4.cloudfront.net/2014/x/psets/5/pset5/pset5.html#_whodunit_and_more

WHODUNIT

We’re going to find out. In this game clue, you might be given a little red image. And the image is very red and spotty. Your job is to revealed the hidden message among this file’s red noise is a drawing whodunit.

rednoise

In whodunit, you are given a Bitmap image that was very spotty and red. And run the whodunit program to reveal the hidden message. So, let’s break this problem.

TODO :
1. Open the file
2. Update header info
3. Read clue scanline, pixel by pixel
4. Change pixel color as necessary
5. Write into verdict scanline, pixel by pixel

First, you’ll want to open the file, the clue that you’ve given. And also create a verdict bitmap file. And then you’ll want to update the bitmap header info for the verdict outfile. More on that later. And then, you’ll wan to read into the clue, scanline pixel by pixel, change pixel color as necessary and write into verdict scanline pixel by pixel. How do we start it? Luckily, we have a copy.c in the distribution code and this is going to prove quite useful to us. Here is the sneak peak of copy.c

openfile
And, then read infile and update the outfile’s header

readinfileandupdate
Then, read each pixel

read each pixel
And then, write the pixel into the outfile:

fwrite(&triple, sizeof(RGBTRIPLE), 1, outptr);

So, your first step might be to run the following command in the terminal:

cp copy.c whodunit.c

This would create a copy from a copy.c named whodunit.c

Pseudocode for whodunit.c :
1. Open the file (still same like copy.c , so no need to change anything)
2. Update header info (still same)
3. Read clue scanline, pixel by pixel (still same)
4. Change pixel color as necessary
5. Write into verdict scanline, pixel by pixel
Ok, I’ll walk you thru it.

1. Opening files :

FILE* inptr = fopen("foo.bmp", "r");

Opens foo.bmp for reading that represented by “r“.

 FILE* outptr = fopen("bar.bmp", "w"); 

Opens bar.bmp for writing that represented by “w“.

Next step, we’re going to update header info. What’s the header info ? Well, first we have to know what the bitmap is.

offset
Open up the bmp.h on gedit.
What does the structure of BITMAPFILEHEADER contain ? The BITMAPFILEHEADER structure contains information about the type, size, and layout of a file that contains a DIB [device-independent bitmap].

bitmapheaderinfo

Then, what does the structure of BITMAPINFOHEADER contain ? The BITMAPINFOHEADER structure contains information about the dimensions and color format of a DIB [device-independent bitmap].
bitmapinfoheader

Header :

  • biSizeImage : is the total size of image (in bytes) and include pixel and padding. More on those later.
  • biWidth : is the width of image (in pixels) , this doesn’t incude padding.
  • biHeight : is the height of image (in pixels).

There are 2 structs on Header : BITMAPFILEHEADER and BITMAPINFOHEADER.

So, how do we update the header info? We first want to see is anything changing in the clue (the infile) to the verdict (the outfile).
2. Updating Header Info :

  • New bmp → new header info
  • Is anything changing ? Well, no actually. Because we’re going to changing the color.
    File size (not changing).
    Image size (not changing).
    Width (not changing).
    Height (not changing).

So, you’re right for now by just copy it. Next, how do we read every pixel ?
3. Reading file :

fread(&data, size, number, inptr);

&data : is a pointer to a struct which will contain the bytes you’re reading.
size : size of each elements to read , here the function sizeof will coming handy.
number : is a number of each elements to read.
inptr : FILE* to read from
Then, let’s change the color as we need :

rgbtriple
As you can see, inside of the rgb structure, there are rgbtBlue, which is the amount of blue, rgbtRed which is the amount of red, and rgbtGreen which is the amount of green.

4. Changing pixel color :
Each color represented by 3 bytes :

  • amount of blue
  • amount of red
  • amount of green
    ff0000 blue
    ffffff → white

So, can you change the color to green ? You will want to call RGBTRIPLE (a struct to represent pixel). By using dot (.) operator.
See example below :

RGBTRIPLE triple;
triple.rgbtBlue = 0x00;
triple.rgbtGreen = 0xff;
triple.rgbtRed= 0x00;

With that, it gives me a green pixel. Next, what if I want to check the value of something?
See example below :
If you’d like check the value of blue. You may execute :

if (triple.rgbtBlue == 0xff)
{
   printf("I’m feeling blue!");
}

For whodunit, let me suggest you to use “remove all RED” method. But still, there are a couple of ways to solve this problem. Ok, so now assume we’ve changed the pixel color. The next step is write into the verdict scanline.

5. Writing into verdict scanline, pixel by pixel :

Again you’ll want to take a look copy.c and see how to writing files.

fwrite(&data, size, number, outptr);

&data : is a pointer to a struct which will contain the bytes you’re reading from.
size
number
outptr : FILE* to write to.

But, you also have to determine the padding. What is padding? Well, every RGBt pixel is 3 bytes long. But, the scanline for bitmap image has to be multiple of 4 bytes. If the number of pixels isn’t multiple of 4, we need “padding”. Padding is just the represented of zero. So, how do we read and write this ? Well, you can’t actually “fread” padding but you can calculate it.

Padding :

  • Padding isn’t a RGBTRIPLE so we can’t fread padding.
  • The clue and the verdict have the same width, so the padding is the same!
padding = (4 - (bi.biWidth * sizeof(RGBTRIPLE)) % 4) % 4;

Now, we have the padding formula. Then, how can we write it? Using “fputc”.

fputc (chr, outptr);

chr : char to write
outptr : FILE* to write to.

fputc(0x00, outptr);

Then, you’ll want to move file position indicator. Now, you’ll want to use “fseek”.

fseek(inptr, amount, from);

inptr : FILE* to seek in.
amount : number of bytes to move cursor.
from : relates to reference points.

  • SEEK_CUR : (current position in file).
  • SEEK_SET : (beginning of file).
  • SEEK_END : (end of file).

So, we’ll want to use fseek to skip over the padding of infile. So, once you open verdict, you can see who’s the creator of this problem. You’re done whodunit!

RESIZE

Alright, next challenge! Implement now in resize.c a program called resize that resizes 24-bit uncrompressed BMPs by a factor of n. Your program should accept exactly three-command line arguments, per the below usage, whereby the first(n) must be a positive integer less than or equal to 100, the second the name of the file to be resized, and the third name of resized version to be written.

Usage : ./resize infile outfile

With a program like this, we could have created large.bmp out of small.bmp by resizing the latter by a factor of 4 (i.e multiplying both its width and its height by 4), per the below :

./resize 4 small.bmp large.bmp

You’re welcome to get started by copying (yet again) copy.c and naming the copy resize.c. But spend some time thinking about what it means to resize a BMP.

TODO :

  1. Open file
  2. Update header info for outfile
  3. Read each scanline pixel by pixel
  4. Resize horizontally
  5. Remember padding!
  6. Resize vertically
  • Opening file

You know how to do that! It’s still same and also you’re welcome to get copying again copy.c by execute :

cp copy.c resize.c
  • Updating header info for outfile

Because we have new BMP we also have to update the header info. What’s changing ? File size, image size, width and height are have to change. Try to look at the header info (again) :
Header :

  • biSizeImage : is the total size of image (in bytes) and include pixel and padding.
  • biWidth : is the width of image (in pixels) , this doesn’t include padding.
  • biHeight : is the height of image (in pixels).
  • There are 2 structs on Header : BITMAPFILEHEADER and BITMAPINFOHEADER
  • Reading scanline pixel by pixel

Again, you’ll wan to call file I/O function “fread”. Look at below (again) :

fread(&data, size, number, inptr);

&data : is a pointer to a struct which will contain the bytes you’re reading.
size : size of each elements to read , here the function sizeof will coming handy.
number : is a number of each elements to read.
inptr : FILE* to read from.

  • Resizing horizontally

If I could assume a pixel as a block . How do we resize it horizontally ? well, let’s assume :

n = 2

Then, the block should be two times of the current width. OK, this time you’ll want to use “fwrite” function.

fwrite(&data, size, number, outptr);

&data : is a pointer to a struct which will contain the bytes you’re reading from.
size
number
outptr : FILE* to write to.
Can you figure out how? I leave you to determine it by yourself.

  •  Remember padding.

Do you remember how ? If don’t please take a look at the previous explanation about padding on whodunit.

  • Resize Vertically

Several ways to do this :

  1. “Rewrite” methods :
    • Remember all of the pixels in a array
    • And Write as many as needed

or

2. “Re-copy” methods :
• Go back to the start of the original row
• Repeat the horizontal resizing.

So, we’ll understand resize.c is all about :
1. Each pixel repeated n times
2. Every row repeated n times
With that you’ll have a Bitmap large image! If you’d like to check the correctness of your program with check50, you may execute the below.

check50 2014/x/pset5/resize bmp.h resize.c

If you’re done with it, be sure you can answer the following questions in ~/Dropbox/pset5/questions.txt.

6. What’s stdint.h ?
7. What’s the point of using uint8_t , uint32_t , int32_t , and uint16_t in a program?
8. How many bytes is a BYTE , a DWORD , a LONG , and a WORD , respectively? (Assume a 32-bit architecture like the CS50 Appliance.)
9. What (in ASCII, decimal, or hexadecimal) must the first two bytes of any BMP file be? (Leading bytes used to identify file formats(with high probability) are generally called “magic numbers.)”
10. What’s the difference between bfSize and biSize ?
11. What does it mean if biHeight is negative?

12. What field in BITMAPINFOHEADER specifies the BMP’s color depth (i.12. e., bits per pixel)?
13. Why might fopen return NULL in copy.c:37 ?
14. Why is the third argument to fread always 1 in our code?
15. What value does copy.c:70 assign padding if bi.biWidth is 3 ?
16. What does fseek do?
17. What is SEEK_CUR ?


 CSI (Computer Science Investigation)

Alright, so let’s pull up all your new skill into the last. Take a look at http://cdn.cs50.net/2014/x/psets/5/pset5/pset5.html#_csi_computer_science_investigation

Long story shorts, in this problem you have to recover 50 photos that’ve been deleted. Fortunately, in the Computer world “deleted” tends not to mean “deleted” so much as “forgotten”. Let’s make a file called recover.c inside the ~/Dropbox/pset5/jpg a program that will recover all photos.

What is the format for photos? Well, it should be a JPEG. You’ll know that JPEGs are more complicated than BMPs but luckily JPEGs have signatures patterns of bytes. Specially, the first four bytes of most JPEGs are either :

0xff 0xd8 0xff 0xe0
or
0xff 0xd8 0xff 0xe1

For simplicity, you should hard code ”card.raw” in your program, your program need not accept any command-line arguments. When executed, your program should recover every one of JPEGs from “card.raw“. Your program should number the files its outputs by naming each ###.jpg where ### is three-digits decimal number from 000 on up (you’ll want to call sprintf function). How to check whether your program is correct? Well, just simply double-click and take a look! If the photo are appears, your operation was likely a success!
Okay, let’s get started.

TODO :
1. Open memory card file
2. Find the beginning of JPG
3. Open a new JPG
4. Write 512 bytes until new JPG is found
5. Detect end of file

  • Opening memory card file

You know how to solve this already! Don’t forget to use File I/O function.

  • Find the beginning of JPG

Well, just like a BMP, JPEG is also just a sequence of bytes. As you’ve read above, the signatures of JPEGs are either :

0xff 0xd8 0xff 0xe0
or
0xff 0xd8 0xff 0xe1

And, it’s stored side by side in the memory card. How can do we read the 512 bytes? Well, let’s go back to “fread” struct.

fread(&data, size, number, inptr);

&data : buffer.
We have to read 512 bytes at the time. Can you figure out how? You’ll want to define BUFFER and storing it by 512.

  • Opening a new JPG

Since you’ve reached the first JPG, then how do we open it? Let’s make the new JPG. The file name of JPG is ###.jpg the ### is going to be number format. So, the first jpeg you’ve found is 000.jpg and so forth. So keep track!

Then, how do you actually make the jpg? You’ll want to call “sprintf” function. It’s a bit similar with“printf”. But, in this case, “sprintf” will print the file out in the current directory and not in the terminal.

sprint(title, "%d.jpg", 2);

title : char array to store the resultant string
%d.jpg : will give you name like 2.jpg.. so, how do you make it to be 002.jpg?

  • Write 512 bytes until new JPG is found

Remember “fwrite

  • Detect end of file

Ok, so it’s a bit confusing now. Let me get you the pseudocode for recover.c.
Pseudocode :

open card file
    while eof check
         fread first 512 bytes
         if signature is JPEG
              check if outptr != null
              close output file
              update file name counter
            create new filename
            open file
            write the current 512 bytes to the outfile
         else if signature is not JPG && outptr != null
            fwrite current 512 bytes to outfile
close any remaining file

With that you’ve recovered all of the photos! Congratulations! If you’d like to check your correctness of your program with check50, you may execute the below :

check50 2014/x/pset5/recover recover.c

And you’re done!

This was Problem Set 5.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s