Problem Set 6 : Dictionary

Hello fellow coder! Welcome back to CS50! So, this is the beginning of make your own data structure and optimize your code’s (real-world) running time. What is the problem actually? Well, this problem set is about to make a “Spell-checker”. Like problem set 5, it will comes with the distribution code. How is the spell checker works? Well, we’re going to find out later. This was the hardest problem set that I ever done to be honest.

Now, let’s get started!
Per the instruction open up the terminal window, and execute :

update50

To make sure your appliance is up-to-date.Then, execute :

~/Dropbox

In order to navigate your way to ~/Dropbox directory. Then, execute :

wget http://cdn.cs50.net/2013/fall/psets/6/pset6/pset6.zip

To download a ZIP of this problem set’s disro. Next, execute :

ls

You could see the file called pset6.zip in your ~/Dropbox directory. Then, you’ll want to unzip the file by executing the below :

unzip pset6.zip

Then execute :

ls

To ensure that you have a directory called pset6. If you then want to delete the zip file by execute :

rm -f pset6.zip

To remove the pset6.zip file. Next execute :

cd pset6

Now let’s open what’s inside the pset6 directory by execute :

ls

Then you should the the directory contains the below :

dictionary.c dictionary.h Makefile questions.txt speller.c

Now, let’s get started!

First, let’s be friends with “running time“. On input size of n, an algorithm with a running time of n is asymptotically equivalent in term big O, to an algorithm with a running time of 2n. In the real world, though, the fact of the matter is that the latter feels twice as slow as the former.
The challenge of this problem set is to implement the fastest spell-checker you can! By “fastest”, though, we’re talking actual, real-world, noticeable seconds-none of that asymptotic stuff this time.
Take your time to take a look at dictionary.h on gedit. In this file you should see the cs50’s staff declared a dictionary’s functionality.

dictionary.h
They declared the prototype of four functions, take a note of what each should do. Anyway, you’ll want to implement 4 functions to complete this problem set, and they are : check, load, size and unload. The prototypes inside the dictionary.h are :

// maximum length for a word
// (e.g., pneumonoultramicroscopicsilicovolcanoconiosis)
#define LENGTH 45
/**
 * Returns true if word is in dictionary else false.
 */
bool check(const char* word);

/**
 * Loads dictionary into memory. Returns true if successful else false.
 */
bool load(const char* dictionary);

/**
 * Returns number of words in dictionary if loaded else 0 if not yet loaded.
 */
unsigned int size(void);

/**
 * Unloads dictionary from memory. Returns true if successful else false.
 */
bool unload(void);

#endif // DICTIONARY_H

Okay, next open up speller.c with gedit. And try to understand it. You won’t need to change anything in this file.

/****************************************************************************
 * speller.c
 *
 * Computer Science 50
 * Problem Set 6
 *
 * Implements a spell-checker.
 ***************************************************************************/

#include <ctype.h>
#include <stdio.h>
#include <sys/resource.h>
#include <sys/time.h>

#include "dictionary.h"
#undef calculate
#undef getrusage

// default dictionary
#define DICTIONARY "/home/cs50/pset6/dictionaries/large"

// prototype
double calculate(const struct rusage* b, const struct rusage* a);

int main(int argc, char* argv[])
{
    // check for correct number of args
    if (argc != 2 && argc != 3)
    {
        printf("Usage: speller [dictionary] text\n");
        return 1;
    }

    // structs for timing data
    struct rusage before, after;

    // benchmarks
    double ti_load = 0.0, ti_check = 0.0, ti_size = 0.0, ti_unload = 0.0;

    // determine dictionary to use
    char* dictionary = (argc == 3) ? argv[1] : DICTIONARY;

    // load dictionary
    getrusage(RUSAGE_SELF, &before);
    bool loaded = load(dictionary);
    getrusage(RUSAGE_SELF, &after);

    // abort if dictionary not loaded
    if (!loaded)
    {
        printf("Could not load %s.\n", dictionary);
        return 1;
    }

    // calculate time to load dictionary
    ti_load = calculate(&before, &after);

    // try to open text
    char* text = (argc == 3) ? argv[2] : argv[1];
    FILE* fp = fopen(text, "r");
    if (fp == NULL)
    {
        printf("Could not open %s.\n", text);
        unload();
        return 1;
    }

    // prepare to report misspellings
    printf("\nMISSPELLED WORDS\n\n");

    // prepare to spell-check
    int index = 0, misspellings = 0, words = 0;
    char word[LENGTH+1];

    // spell-check each word in text
    for (int c = fgetc(fp); c != EOF; c = fgetc(fp))
    {
        // allow only alphabetical characters and apostrophes
        if (isalpha(c) || (c == '\'' && index > 0))
        {
            // append character to word
            word[index] = c;
            index++;

            // ignore alphabetical strings too long to be words
            if (index > LENGTH)
            {
                // consume remainder of alphabetical string
                while ((c = fgetc(fp)) != EOF && isalpha(c));

                // prepare for new word
                index = 0;
            }
        }

        // ignore words with numbers (like MS Word can)
        else if (isdigit(c))
        {
            // consume remainder of alphanumeric string
            while ((c = fgetc(fp)) != EOF && isalnum(c));

            // prepare for new word
            index = 0;
        }

        // we must have found a whole word
        else if (index > 0)
        {
            // terminate current word
            word[index] = '';

            // update counter
            words++;

            // check word's spelling
            getrusage(RUSAGE_SELF, &before);
            bool misspelled = !check(word);
            getrusage(RUSAGE_SELF, &after);

            // update benchmark
            ti_check += calculate(&before, &after);

            // print word if misspelled
            if (misspelled)
            {
                printf("%s\n", word);
                misspellings++;
            }

            // prepare for next word
            index = 0;
        }
    }

    // check whether there was an error
    if (ferror(fp))
    {
        fclose(fp);
        printf("Error reading %s.\n", text);
        unload();
        return 1;
    }

    // close text
    fclose(fp);

    // determine dictionary's size
    getrusage(RUSAGE_SELF, &before);
    unsigned int n = size();
    getrusage(RUSAGE_SELF, &after);

    // calculate time to determine dictionary's size
    ti_size = calculate(&before, &after);

    // unload dictionary
    getrusage(RUSAGE_SELF, &before);
    bool unloaded = unload();
    getrusage(RUSAGE_SELF, &after);

    // abort if dictionary not unloaded
    if (!unloaded)
    {
        printf("Could not unload %s.\n", dictionary);
        return 1;
    }

    // calculate time to unload dictionary
    ti_unload = calculate(&before, &after);

    // report benchmarks
    printf("\nWORDS MISSPELLED:     %d\n", misspellings);
    printf("WORDS IN DICTIONARY:  %d\n", n);
    printf("WORDS IN TEXT:        %d\n", words);
    printf("TIME IN load:         %.2f\n", ti_load);
    printf("TIME IN check:        %.2f\n", ti_check);
    printf("TIME IN size:         %.2f\n", ti_size);
    printf("TIME IN unload:       %.2f\n", ti_unload);
    printf("TIME IN TOTAL:        %.2f\n\n", 
     ti_load + ti_check + ti_size + ti_unload);

    // that's all folks
    return 0;
}

/**
 * Returns number of seconds between b and a.
 */
double calculate(const struct rusage* b, const struct rusage* a)
{
    if (b == NULL || a == NULL)
    {
        return 0.0;
    }
    else
    {
        return ((((a->ru_utime.tv_sec * 1000000 + a->ru_utime.tv_usec) -
                 (b->ru_utime.tv_sec * 1000000 + b->ru_utime.tv_usec)) +
                ((a->ru_stime.tv_sec * 1000000 + a->ru_stime.tv_usec) -
                 (b->ru_stime.tv_sec * 1000000 + b->ru_stime.tv_usec)))
                / 1000000.0);
    }
}

Alright, next challenge is to implement load, check, size and unload as efficiently as possible. Now, open up dictionary.c on gedit. But, before you dive in, be sure to read the specifications.

OK, I assume you have read the specs, now let’s get you started! What do we need to do in dictionary.c ? In the dictionary.c, we have the helper function LOAD, which is loads the dictionary. Then, function CHECK, to checks if a given word is in dictionary. Function SIZE, returns the number of words in the dictionary. And function UNLOAD to frees the dictionary from memory .

Implement LOAD
A spell-checker will call LOAD function on the dictionary text file. For each word in the dictionary text file, LOAD function will store those words in the dictionary data structure. You can use either HASHTABLE or TRIE. But before we’re about dealing with linked list aswell, since the linked list is the buckets. In linked list, each node has a value, as well as a pointer to the next node. Dealing with linked list, is very important do not lose any links. And the last node points to NULL. How do we define it in C?

Linked list :

typedef struct node
{
   char word[LENGTH+1];
   struct node * next;
}
node;

And there we have linked list. It just one linked list. But hashtable is an whole array in linked list. We have the same structure like before. But if we wanna go with an actual hashtable, we’re gonna make node pointer array , here for example size of 500.

typedef struct node
{
 char word[LENGTH+1];
 struct node* next;
}
node;
node* hashtable[500]

Then, you’ll want to malloc the node for every new word that you have.

node* new_node = malloc(sizeof(node));
fscanf(file, "%s", new_node → word)

Where new_node -> word is the destination of that word.

Now, we’ll want to hash that word. That’s why we gonna call the HASH Function. Well, hash function takes a string and returns an index, in this case the index has to be less than the number of buckets. And the hash function has to be deterministic, that means, the same value needs to map to the same bucket every time. So, new_node → word has the word from dictionary. Then, hashing new_node → word will give us the index of the bucket of the hashtable. And it will insert that in the specific linked list.

Now, how about TRIE ?
Well, in TRIE function every node contains an array of node* s. One for every letter in alphabet + ”\’. And each element in the array points to another node. The structure is:

typedef struct node
{
   bool is_word;
   struct node* children[27];
}
node;
node* root;

OK now, let’s implement the LOAD function with TRIE in the pseudocode below :

for every dictionary word, iterate thru the trie
each element in children corresponds to a different letter.
check the value at children[i]
  if NULL , malloc the node, have children[i] point to it
  if not NULL, move to the new node and continue
if at the end of word, set is_word to true

Implement CHECK

Speller.c will call CHECK function on every word in the inputted text file and printing all miss-spelled words. CHECK function must operate case-insensitivity. And assume the strings will only alphabetical and/ or apostrophes.

Let’s look at how you might CHECK with HASHTABLE :

if the word exist, it can be found in the hashtable
which bucket would the word be in?
  hashtable[hash(word)]
search in that linked list by using strcmp

Now, let’s look at how you might CHECK with TRIE :

for each letter in input word
  go to corresponding element in children
  if NULL, word is misspelled
  if not NULL, move to next letter
  once at the end of input file
  check if is_word is true

Implement SIZE
Speller.c will also call SIZE function to determine of words in the dictionary. Just returns number of words in dictionary if loaded else 0 if not yet loaded.

Implement UNLOAD
And the last, speller.c will call UNLOAD function to free up memory.
Here’s how you might UNLOAD with hashtable :

for every element in hashtable
for every node in the linked list, you’ll want to free that node.

Freeing linked list :

node* cursor = head;
while(cursor != NULL)
 {
    node *temporary = cursor;
    cursor = cursor->next;
    free(temporary);
 }

Then, how about using TRIE ?

unload from bottom to top
travel to the lowest possible node
 free all pointers in children
 backtrack upwards, freeing all elements in each children array until you hit root node
recursion will coming handy

To make sure you’ve properly freed everything that you’ve malloced you can use VALGRIND. Running valgrind, will run your program how many bytes of memory you’re using and make sure that you’ve freed them all .

valgrind --leak-check=full ./speller ~cs50/pset6/texts/austinpowers.txt

Once you’ve done with dictionary.c, then compile and run your program as always. Then, check how fast your “spell-checker” works  by compare it with the staff’s. If it is done, you’ll want to check your correctness with check50.

Be sure to answer the following questions inside the questions.txt on gedit.

0. What is pneumonoultramicroscopicsilicovolcanoconiosis ?
1. According to its man page, what does getrusage do?
2. Per that same man page, how many members are in a variable of type struct rusage ?
3. Why do you think we pass before and after by reference (instead of by value) to calculate , even though we’re not changing their contents?
4. Explain as precisely as possible, in a paragraph or more, how main goes about reading words from a file. In other words, convince us that you indeed understand how that function’s for loop works.
5. Why do you think we used fgetc to read each word’s characters one at a time rather than use fscanf with a format string like “%s” to read whole words at a time? Put another way, what problems might arise by relying on fscanf alone?
6. Why do you think we declared the parameters for check and load as const ?

This was Problem Set 6.

Problem Set 5 : Forensics

Hello guys! Welcome back. This is CS50! We’re going to start a Problem Set 5. In this problem set, you’ll get more comfortable with data structures, hexadecimal, and pointers. Also, acquaint you with file I/O. As always, before we get started you will want to read these :

Recommended Reading :
http://www.cprogramming.com/tutorial/cfileio.html
http://en.wikipedia.org/wiki/BMP_file_format
http://en.wikipedia.org/wiki/Hexadecimal
http://en.wikipedia.org/wiki/Jpg
But, the Wikipedia articles are a bit dense, so feel free to skip it.

Recommended watching :
File I/O : https://www.youtube.com/watch?v=KwvObCA04dU
Structs : https://www.youtube.com/watch?v=EzRwP7NV0LM
Valgrind : https://www.youtube.com/watch?v=EzRwP7NV0LM
GDB : https://www.youtube.com/watch?v=USPvePv1uzE


Getting Started

Welcome back!
As always, you’ll need to read a long the specification. http://d2o9nyf4hwsci4.cloudfront.net/2014/x/psets/5/pset5/pset5.html
Open you appliance, then open up the terminal window and execute

update50

To make sure your appliance is up-to-date.
Like Problem Set 3 and 4. This problem set comes with the distribution code that you’ll need to download before getting started. Go ahead, and execute :

cd ~/Dropbox

In order to navigate your way to ~/Dropbox. Next, execute :

wget http://cdn.cs50.net/2013/fall/psets/5/pset5/pset5.zip

In order to download a ZIP (i.e., compressed version) of this problem set’s distro.If you then execute :

ls

you should see that you now also have a pset5 directory. You’re now welcome to delete the ZIP file with the below :

rm -f pset5.zip

Now dive into that pset5 directory by executing the below :

cd pset5

Now execute :

ls

And you should see that the directory contains the below :

bmp/ jpg/ questions.txt

How fun! There are two subdirectories and a file. Who know what’s inside ? Let’s get started.
Have you ever seen Windows XP’s default wallpaper (think the rolling hills and blue skies) ? Then, you’ve seen BMP. If you ever looked at webpage you’ve probably seen GIF. If you’ve ever looked at digital photo you’ve probably seen JPEG. And, if you’ve ever taken a screenshot of Mac, you’ve seen PNG. Read up a bit of GIF, JPEG, BMP and PNG file formats, then open up questions.txt on gedit. Be sure you can answer these questions :

0. How many different colors does each format support?
1. Which of these formats supports animation?
2. What’s the difference between lossy and lossless compression?
3. Which of these formats is lossy-compressed?

Read up at the MIT’s article : http://cdn.cs50.net/2013/fall/psets/5/garfinkel.pdf
And try to answer these questions on ~/Dropbox/pset5/questions.txt.

4. What happens, technically speaking, when a file is deleted on a FAT file system?
5. What can someone like you do to ensure (with high probability) that files you delete cannot be recovered?


Anyway, you have to do 3 parts in this problem set and they are :

  • whodunit.c
  • resize.c
  • recover.c

WHODUNIT and RESIZE

http://d2o9nyf4hwsci4.cloudfront.net/2014/x/psets/5/pset5/pset5.html#_whodunit_and_more

WHODUNIT

We’re going to find out. In this game clue, you might be given a little red image. And the image is very red and spotty. Your job is to revealed the hidden message among this file’s red noise is a drawing whodunit.

rednoise

In whodunit, you are given a Bitmap image that was very spotty and red. And run the whodunit program to reveal the hidden message. So, let’s break this problem.

TODO :
1. Open the file
2. Update header info
3. Read clue scanline, pixel by pixel
4. Change pixel color as necessary
5. Write into verdict scanline, pixel by pixel

First, you’ll want to open the file, the clue that you’ve given. And also create a verdict bitmap file. And then you’ll want to update the bitmap header info for the verdict outfile. More on that later. And then, you’ll wan to read into the clue, scanline pixel by pixel, change pixel color as necessary and write into verdict scanline pixel by pixel. How do we start it? Luckily, we have a copy.c in the distribution code and this is going to prove quite useful to us. Here is the sneak peak of copy.c

openfile
And, then read infile and update the outfile’s header

readinfileandupdate
Then, read each pixel

read each pixel
And then, write the pixel into the outfile:

fwrite(&triple, sizeof(RGBTRIPLE), 1, outptr);

So, your first step might be to run the following command in the terminal:

cp copy.c whodunit.c

This would create a copy from a copy.c named whodunit.c

Pseudocode for whodunit.c :
1. Open the file (still same like copy.c , so no need to change anything)
2. Update header info (still same)
3. Read clue scanline, pixel by pixel (still same)
4. Change pixel color as necessary
5. Write into verdict scanline, pixel by pixel
Ok, I’ll walk you thru it.

1. Opening files :

FILE* inptr = fopen("foo.bmp", "r");

Opens foo.bmp for reading that represented by “r“.

 FILE* outptr = fopen("bar.bmp", "w"); 

Opens bar.bmp for writing that represented by “w“.

Next step, we’re going to update header info. What’s the header info ? Well, first we have to know what the bitmap is.

offset
Open up the bmp.h on gedit.
What does the structure of BITMAPFILEHEADER contain ? The BITMAPFILEHEADER structure contains information about the type, size, and layout of a file that contains a DIB [device-independent bitmap].

bitmapheaderinfo

Then, what does the structure of BITMAPINFOHEADER contain ? The BITMAPINFOHEADER structure contains information about the dimensions and color format of a DIB [device-independent bitmap].
bitmapinfoheader

Header :

  • biSizeImage : is the total size of image (in bytes) and include pixel and padding. More on those later.
  • biWidth : is the width of image (in pixels) , this doesn’t incude padding.
  • biHeight : is the height of image (in pixels).

There are 2 structs on Header : BITMAPFILEHEADER and BITMAPINFOHEADER.

So, how do we update the header info? We first want to see is anything changing in the clue (the infile) to the verdict (the outfile).
2. Updating Header Info :

  • New bmp → new header info
  • Is anything changing ? Well, no actually. Because we’re going to changing the color.
    File size (not changing).
    Image size (not changing).
    Width (not changing).
    Height (not changing).

So, you’re right for now by just copy it. Next, how do we read every pixel ?
3. Reading file :

fread(&data, size, number, inptr);

&data : is a pointer to a struct which will contain the bytes you’re reading.
size : size of each elements to read , here the function sizeof will coming handy.
number : is a number of each elements to read.
inptr : FILE* to read from
Then, let’s change the color as we need :

rgbtriple
As you can see, inside of the rgb structure, there are rgbtBlue, which is the amount of blue, rgbtRed which is the amount of red, and rgbtGreen which is the amount of green.

4. Changing pixel color :
Each color represented by 3 bytes :

  • amount of blue
  • amount of red
  • amount of green
    ff0000 blue
    ffffff → white

So, can you change the color to green ? You will want to call RGBTRIPLE (a struct to represent pixel). By using dot (.) operator.
See example below :

RGBTRIPLE triple;
triple.rgbtBlue = 0x00;
triple.rgbtGreen = 0xff;
triple.rgbtRed= 0x00;

With that, it gives me a green pixel. Next, what if I want to check the value of something?
See example below :
If you’d like check the value of blue. You may execute :

if (triple.rgbtBlue == 0xff)
{
   printf("I’m feeling blue!");
}

For whodunit, let me suggest you to use “remove all RED” method. But still, there are a couple of ways to solve this problem. Ok, so now assume we’ve changed the pixel color. The next step is write into the verdict scanline.

5. Writing into verdict scanline, pixel by pixel :

Again you’ll want to take a look copy.c and see how to writing files.

fwrite(&data, size, number, outptr);

&data : is a pointer to a struct which will contain the bytes you’re reading from.
size
number
outptr : FILE* to write to.

But, you also have to determine the padding. What is padding? Well, every RGBt pixel is 3 bytes long. But, the scanline for bitmap image has to be multiple of 4 bytes. If the number of pixels isn’t multiple of 4, we need “padding”. Padding is just the represented of zero. So, how do we read and write this ? Well, you can’t actually “fread” padding but you can calculate it.

Padding :

  • Padding isn’t a RGBTRIPLE so we can’t fread padding.
  • The clue and the verdict have the same width, so the padding is the same!
padding = (4 - (bi.biWidth * sizeof(RGBTRIPLE)) % 4) % 4;

Now, we have the padding formula. Then, how can we write it? Using “fputc”.

fputc (chr, outptr);

chr : char to write
outptr : FILE* to write to.

fputc(0x00, outptr);

Then, you’ll want to move file position indicator. Now, you’ll want to use “fseek”.

fseek(inptr, amount, from);

inptr : FILE* to seek in.
amount : number of bytes to move cursor.
from : relates to reference points.

  • SEEK_CUR : (current position in file).
  • SEEK_SET : (beginning of file).
  • SEEK_END : (end of file).

So, we’ll want to use fseek to skip over the padding of infile. So, once you open verdict, you can see who’s the creator of this problem. You’re done whodunit!

RESIZE

Alright, next challenge! Implement now in resize.c a program called resize that resizes 24-bit uncrompressed BMPs by a factor of n. Your program should accept exactly three-command line arguments, per the below usage, whereby the first(n) must be a positive integer less than or equal to 100, the second the name of the file to be resized, and the third name of resized version to be written.

Usage : ./resize infile outfile

With a program like this, we could have created large.bmp out of small.bmp by resizing the latter by a factor of 4 (i.e multiplying both its width and its height by 4), per the below :

./resize 4 small.bmp large.bmp

You’re welcome to get started by copying (yet again) copy.c and naming the copy resize.c. But spend some time thinking about what it means to resize a BMP.

TODO :

  1. Open file
  2. Update header info for outfile
  3. Read each scanline pixel by pixel
  4. Resize horizontally
  5. Remember padding!
  6. Resize vertically
  • Opening file

You know how to do that! It’s still same and also you’re welcome to get copying again copy.c by execute :

cp copy.c resize.c
  • Updating header info for outfile

Because we have new BMP we also have to update the header info. What’s changing ? File size, image size, width and height are have to change. Try to look at the header info (again) :
Header :

  • biSizeImage : is the total size of image (in bytes) and include pixel and padding.
  • biWidth : is the width of image (in pixels) , this doesn’t include padding.
  • biHeight : is the height of image (in pixels).
  • There are 2 structs on Header : BITMAPFILEHEADER and BITMAPINFOHEADER
  • Reading scanline pixel by pixel

Again, you’ll wan to call file I/O function “fread”. Look at below (again) :

fread(&data, size, number, inptr);

&data : is a pointer to a struct which will contain the bytes you’re reading.
size : size of each elements to read , here the function sizeof will coming handy.
number : is a number of each elements to read.
inptr : FILE* to read from.

  • Resizing horizontally

If I could assume a pixel as a block . How do we resize it horizontally ? well, let’s assume :

n = 2

Then, the block should be two times of the current width. OK, this time you’ll want to use “fwrite” function.

fwrite(&data, size, number, outptr);

&data : is a pointer to a struct which will contain the bytes you’re reading from.
size
number
outptr : FILE* to write to.
Can you figure out how? I leave you to determine it by yourself.

  •  Remember padding.

Do you remember how ? If don’t please take a look at the previous explanation about padding on whodunit.

  • Resize Vertically

Several ways to do this :

  1. “Rewrite” methods :
    • Remember all of the pixels in a array
    • And Write as many as needed

or

2. “Re-copy” methods :
• Go back to the start of the original row
• Repeat the horizontal resizing.

So, we’ll understand resize.c is all about :
1. Each pixel repeated n times
2. Every row repeated n times
With that you’ll have a Bitmap large image! If you’d like to check the correctness of your program with check50, you may execute the below.

check50 2014/x/pset5/resize bmp.h resize.c

If you’re done with it, be sure you can answer the following questions in ~/Dropbox/pset5/questions.txt.

6. What’s stdint.h ?
7. What’s the point of using uint8_t , uint32_t , int32_t , and uint16_t in a program?
8. How many bytes is a BYTE , a DWORD , a LONG , and a WORD , respectively? (Assume a 32-bit architecture like the CS50 Appliance.)
9. What (in ASCII, decimal, or hexadecimal) must the first two bytes of any BMP file be? (Leading bytes used to identify file formats(with high probability) are generally called “magic numbers.)”
10. What’s the difference between bfSize and biSize ?
11. What does it mean if biHeight is negative?

12. What field in BITMAPINFOHEADER specifies the BMP’s color depth (i.12. e., bits per pixel)?
13. Why might fopen return NULL in copy.c:37 ?
14. Why is the third argument to fread always 1 in our code?
15. What value does copy.c:70 assign padding if bi.biWidth is 3 ?
16. What does fseek do?
17. What is SEEK_CUR ?


 CSI (Computer Science Investigation)

Alright, so let’s pull up all your new skill into the last. Take a look at http://cdn.cs50.net/2014/x/psets/5/pset5/pset5.html#_csi_computer_science_investigation

Long story shorts, in this problem you have to recover 50 photos that’ve been deleted. Fortunately, in the Computer world “deleted” tends not to mean “deleted” so much as “forgotten”. Let’s make a file called recover.c inside the ~/Dropbox/pset5/jpg a program that will recover all photos.

What is the format for photos? Well, it should be a JPEG. You’ll know that JPEGs are more complicated than BMPs but luckily JPEGs have signatures patterns of bytes. Specially, the first four bytes of most JPEGs are either :

0xff 0xd8 0xff 0xe0
or
0xff 0xd8 0xff 0xe1

For simplicity, you should hard code ”card.raw” in your program, your program need not accept any command-line arguments. When executed, your program should recover every one of JPEGs from “card.raw“. Your program should number the files its outputs by naming each ###.jpg where ### is three-digits decimal number from 000 on up (you’ll want to call sprintf function). How to check whether your program is correct? Well, just simply double-click and take a look! If the photo are appears, your operation was likely a success!
Okay, let’s get started.

TODO :
1. Open memory card file
2. Find the beginning of JPG
3. Open a new JPG
4. Write 512 bytes until new JPG is found
5. Detect end of file

  • Opening memory card file

You know how to solve this already! Don’t forget to use File I/O function.

  • Find the beginning of JPG

Well, just like a BMP, JPEG is also just a sequence of bytes. As you’ve read above, the signatures of JPEGs are either :

0xff 0xd8 0xff 0xe0
or
0xff 0xd8 0xff 0xe1

And, it’s stored side by side in the memory card. How can do we read the 512 bytes? Well, let’s go back to “fread” struct.

fread(&data, size, number, inptr);

&data : buffer.
We have to read 512 bytes at the time. Can you figure out how? You’ll want to define BUFFER and storing it by 512.

  • Opening a new JPG

Since you’ve reached the first JPG, then how do we open it? Let’s make the new JPG. The file name of JPG is ###.jpg the ### is going to be number format. So, the first jpeg you’ve found is 000.jpg and so forth. So keep track!

Then, how do you actually make the jpg? You’ll want to call “sprintf” function. It’s a bit similar with“printf”. But, in this case, “sprintf” will print the file out in the current directory and not in the terminal.

sprint(title, "%d.jpg", 2);

title : char array to store the resultant string
%d.jpg : will give you name like 2.jpg.. so, how do you make it to be 002.jpg?

  • Write 512 bytes until new JPG is found

Remember “fwrite

  • Detect end of file

Ok, so it’s a bit confusing now. Let me get you the pseudocode for recover.c.
Pseudocode :

open card file
    while eof check
         fread first 512 bytes
         if signature is JPEG
              check if outptr != null
              close output file
              update file name counter
            create new filename
            open file
            write the current 512 bytes to the outfile
         else if signature is not JPG && outptr != null
            fwrite current 512 bytes to outfile
close any remaining file

With that you’ve recovered all of the photos! Congratulations! If you’d like to check your correctness of your program with check50, you may execute the below :

check50 2014/x/pset5/recover recover.c

And you’re done!

This was Problem Set 5.

Problem Set 4 : Breakout

Challenge for this problem set is to implement a game. The game is Breakout. Problem Set 3 was also a game, but its graphical user interface (GUI) wasn’t exactly a GUI, it was textual user interface, since we essentially simulated graphic with printf. Let’s give Breakout actual GUI by building atop the Stanford Portable Library (SPL), which is similar spirit with CS50 library but include an API (application programming interface) for GUI programming and more.

Unfortunately, I have no pseudocode for this problem set since you have all you need in the Stanford cslib package.

Take your time to look at the specifications and have fun!

If you’re done with it. You’ll see you’ve made a game. Because this game expects a human to play, no check50 for this one!

Be sure you’re inside the Problem Set 4’s directory by execute :

cd ~/Dropbox/pset4

You’ll see :

jharvard@appliance (~/Dropbox/pset4):

Then, execute :

jharvard@appliance (~/Dropbox/pset4): ./breakout

And, viola!

breakout

You’re ready to play!

This was Problem Set 4.

Problem Set 3 : Game Of Fifteen

Be sure to check the previous post before getting started to read this.

Alright! Welcome back. This is CS50! So, recall in the past days we’ve been spending a quite bit of time on C, on programming, on syntax. And, it’s quite normal if you still struggling about Problem Set 2, to banging your head again on the wall. It’s cryptic-error messages and bugs that you can’t quite chase down. So, if that any consolation hang in there for now. If you can solve it then, can you guess what the string actually says when I typed “ X Lvne Noh! “? Fantastic!

Today, though, we begin to transition to higher level which is more complex but actually it can help you more comfortable in programming and how we can go about optimizing and efficiency of our algorithm and generally solving more interesting problem. How can we go coding effectively by using running time. This problem set will introduce you to larger programs with multiple source files. Accustom you to reading someone else’s code, empowering you with Makefiles, and implement a party favor.

You’ll be more comfortable with :

  • Array(s)
  • Searching algorithm
  • Sorting algorithm
  • gdb

And anyway, you have to do 2 parts of this problem set :

  1. find
  2. fifteen

Recommend Watching :
Lecturer : https://www.youtube.com/watch?v=YxgI7ll4Xtg
Binary search : https://www.youtube.com/watch?v=D5SrAga1pno
Linear search : https://www.youtube.com/watch?v=CX2CYIJLwfg
Bubble sort resources : https://www.youtube.com/watch?v=8Kp-8OGwphY
Insertion sort : https://www.youtube.com/watch?v=DFG-XuyPYUQ
Selection sort : https://www.youtube.com/watch?v=f8hXR_Hvybo
GDB : https://www.youtube.com/watch?v=USPvePv1uzE
Recommend Reading :
Page 17 of http://www.howstuffworks.com/c.htm

Phew, so many videos! And so many pages to read! If you’re done with it, be sure you can anwer these questions when it comes to submit this problem set’s form!

1. Why does binary search require that an array be sorted?
2. Why is bubble sort in O(n) ?
3. Why is insertion sort in Ω(n)?
4. What’s the worst-case running time of merge sort?
5. In no more than 3 sentences, how does selection sort works?


Getting Started.

For problem set 3, you’ll instead download “distribution code”(otherwise known as “distro”), written by cs50 team, and add you own line of code to it. You’ll first need to read and understand their code, though, this problem set is as much about learning to read someone else’s code as it is about writing your own!

Let’s get you started. Go ahead and open up terminal window. Then execute :

update50

To make sure your appliance is up-to-date. Then execute

cd ~/Dropbox

Followed by

wget http://cdn.cs50.net/2013/fall/psets/3/pset3/pset3.zip 

to download a ZIP of this problem set’s distro into your appliance (with command-line program wget). You should see a bunch of output followed by:

‘pset3.zip’ saved

Ultimately confirm that you’ve download pset3.zip by executing :

ls

Then, run

unzip pset3.zip

To unzip file. If you run ls again, you should see that you have newly unzipped directory called pset3 as well. Proceed to execute :

cd pset3

Followed by

ls

And you should find see that the directory contains two “subdirectories” :

fifteen find


Searching Algorithm

Searching algorithm is “ an algorithm for finding an item with specified properties among a collection of items. The items may be stored individually as records in a database; or may be elements of a search space defined by a mathematical formula or procedure, such as the roots of an equation with integer variables; or a combination of the two”

http://en.wikipedia.org/wiki/Search_algorithm

Sorting

A Sorting algorithm is “an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order. Efficient sorting is important for optimizing the use of other algorithms (such as search and merge algorithms) which require input data to be in sorted lists; it is also often useful for canonicalizing data and for producing human-readable output”
http://en.wikipedia.org/wiki/Sorting_algorithm


Okay, let’s dive into the first of those subdirectories.

Open up the terminal and execute :

cd ~/Dropbox/pset3/find

You should see below :

helpers.c helpers.h Makefile find.c generate.c 

Phew, that’s a lot of files! Take a look at each file. And try to undersand what do they mean.

And again, please read the spec first! http://d2o9nyf4hwsci4.cloudfront.net/2014/x/psets/3/pset3/pset3.html#_getting_started

Let’s begin the first problem “find”.

Find

  • Prompt for numbers to fill the haystack
  • Searches the haystack for a needle
    – call sorts and searches, function defined in helper.c
    helpers.c was the helper function in Problem Set 3. You’ll need to complete the fully function on it.

TODO :

  1. Search
  2. Sort

Open up the terminal and execute :

cd ~/Dropbox/pset3/find

To make sure you are in a right subdirectory. Then, open up gedit and open helpers.c.
Not sure where to begin?
Anyway,you’ll want to implement a searching and sorting algorithm.

Search

  1. Return true if value is found at haystack
  2. Return false if values is not found at haystack

Linear search : you can do better!
• O(n) → slow
• Can search any list
Binary search :
• O(log n) → fast
• Can only search sorted list
Binary search pseudocode :

bool search(int value, int values[], int n)
 while length of the list > 0
    look at the middle of list
    if number found, return true
    else if number higher, search left
    else if number lower, search right
 return false

Sort

Sort the values[] array

void sort(int values[], int n)

Selection sort pseudocode :

for i = 0 to n-1
  for j = i + 1 to n
    find index of minimum value
  swap array[min] and array[i]

Bubble sorts pseudocode :

- iterate over the list
- compare adjacent elements
- swap elements that are in the wrong order
- largest elements will ‘bubble’ till the end
- the list is sorted once no elements have been swapped

And, you’re done the first problem! Conguratulations!


The Game Begins

Now the second problem is a Game Of Fifteen puzzle played on a square, two-dimensional board with numbered tiles that slide. The goal of this puzzle is to arrange the board’s tiles from smallest to largest, left to right, top to bottom, with an empty space in board’s bottom-right corner as in the below :

boardtrue

You should make your “move” (i.e that borders the board’s empty space)with sliding any tile from smallest to largest, left to right, top to bottom. Although the configuration of board above depicts the game is already won. You should not make any move diagonally or forcibly removed from the board. If, however and only if the board contain odd number (i.e the height and weight of board are even), the position of number 1 and 2 must be swapped. It solvable if the configuration as see below :

solvable-board

Navigate your way to :

~/Dropbox/pset3/fifteen

And take a look at fifteen.c with gedit.
Read over the code and comments in fifteen.c and be sure you can answer the questions below before proceeding further.

  1. Besides 4 x 4 (which are game of fifteen’s dimension), what other dimension does the framework allow ?
  2. With what sort of data structure is the game’s board represented?
  3. What function do you apparently need to implement?

You’ll want to take “baby steps” to implement this game. Four things that you’ll need to implements are :

  1. Init
  2. Draw
  3. Move
  4. Won
15 14 13 12
11 10 9  8
7  6  5  4
3  1  2  _

Recall that the positions of tiles numbered 1 and 2 should only start swapped as they are in 4 x 4 as example above. But they do not swapped in 3 x 3 example below :

8  7  6
5  4  3
2  1  _

INIT

void init(void)

Initializes the game’s board with tiles numbered 1 through d*d – 1 (i.e., fills board with values but does not actually print them), whereby board[i][j] represents row i and column j.

Board represented by 2D dimensional array.

2d

Init pseudocode :

board should contain the starting state
   • board[i][j] represents the elements at row[i] col[j]
Starts in descending order
   • if the number tiles odd, swap 2 and 1

DRAW

void draw(void)

Prints the board in its current state. Then, print the blank space before single-digit #s using :

printf("%2d", board[i][j]);

Draw pseudocode :

for each row
  for each value in row
     print value and space
  print new line

MOVE

bool move(int tile)

If tile borders empty space, moves tile and returns true, else returns false.

Move pseudocode :

Accept the number of tile to move
  Not its positions!
  Search for the tile’s position.
Remember where the blank tile is now

WON

bool won(void)

Returns true if game is won (i.e., board is in winning configuration), else false. Game is won when tiles are in increasing order.

1  2  3
4  5  6
7  8  _

Won pseudocode :

  • Iterate over board and check the values
  • If any values in incorrect, return false

Viola, you’ve finished Game Of Fifteen. To test your implementation of fifteen, you can certainly try playing it.(No check50 for this one). Know that program to quit by hitting ctrl-c. To test your program with, say, the first of those inputs, execute the below.

./fifteen 3 < ~cs50/pset3/3x3.txt

Now, you’re done with both Find and Game Of Fifteen!

This was Problem Set 3.

Problem Set 2 : Crypto

Hello, world! Congratulations, you’ve completed problem set 1! Now, we’re getting started into Problem Set 2. What was the problem? In this problem set, you’ll study about Cryptography , which was very challenging and exciting at the same . You will be among the “more comfortable” if you can complete this pset. Anyway, you have to complete 2 parts in it :

  1. Caesar Cipher
  2. Vigenere Cipher

Then we’re gonna talk about “What is Cryptography?“ (http://en.wikipedia.org/wiki/Cryptography).

“Why did we do this?” Well, it’s very useful. Say, if you want to send a message to your crush but you don’t want others see what exactly you said. So we can keep secret from each other what the message actually says by using Cryptography. Since, the actual string will be covered by the key which can make the primary text hiding. For example : “Can I have your number ? “ you can use 11 as the secret key, and it will say “Nly T slgp jzfc yfxmpc ?” instead of above. Who knows what you said then? But, how’s work ? That’s simple. Ok, I used 11 for the key, the letter “C” inside of the word “Can” was changed by “N” which was 11 letters away from “C”. Can you figure out the next letters? It’s really fun!

The parts of Cryptography are :

  1. Key (secret position a non- negative integers) see : http://en.wikipedia.org/wiki/Key_%28cryptography%29
  2. Ciphertext (Encrypted message) see : http://en.wikipedia.org/wiki/Ciphertext
  3. Plaintext (Unencrypted message) see : http://en.wikipedia.org/wiki/Plaintext

By the way, an ASCII characters are very useful in order to do this pset.

ascii

(image : http://commons.wikimedia.org/wiki/File:ASCII-Table.svg)


Alright, here we go!

Open up the terminal inside of the appliance then execute :

update50

To make sure that your appliance is up-to date.

Next, execute :

mkdir ~/Dropbox/pset2

This was a command for make new directory called “pset2” inside of “Dropbox”.

Then, execute :

cd ~/Dropbox/pset2

To make sure that you move yourself into that directory. Your prompt now should be like :

jharvard@appliance (~/Dropbox/pset2) : 

  1. Caesar cipher.

Have you watched short videos ? If haven’t , be sure to watch it first (https://www.youtube.com/watch?v=36xNpbosfTY) it will be helpful. Recall from the video’s short that Caesar’s cipher encrypts (i.e scrambles in a reversible way) messages by rotating each letter by “k” positions, wrapping around “z” to “A” as needed (http://en.wikipedia.org/wiki/Caesar_cipher) in other words, if “p” is some plaintext (i.e unencrypted message), “pi” is the i-th character in “p”, and “k” is the secret key (i.e non negative integer), then each letter “ci” in the ciphertext “c” is computed as :

ci = (pi + k) % 26

ci = i-th letter of ciphertext.

pi = i-th letter of plaintext.

k = the key.

%26 = remainder after dividing by 26.

This formula perhaps makes the cipher seem more complicated than it is, but it’s really nice way of expressing the algorithm precisely and concisely.

Anyhow, your goal is to write caesar.c, a program that encrypts message by using Caesar Cipher.

Where to begin?

  • This program needs to accept a command-line argument, “k” so this time you’ll want to declare main with :
int main(int argc, string argv[])

The “argv” is an array of strings. Keep in mind that arrays are zero-indexed. To access the next letter, you use syntax like argv[1]. With that you can access for each letter with the code like :

string k = argv[1];
  • Your program must a single command line-argument. If your program is executed without any command line argument, your program should yell at the user and return value of 1 :
return 1;
  • The users must type an integer at the prompt, that doesn’t mean their input will be stored as an int. It will stored as a string. You’ll need to convert that string to an actual int. As luck, we would have a function atoi http://www.cs50.net/resources/cppreference.com/stdstring/ exists for exactly that purposes. Here’s how you might use :
int k = atoi(argv[1]);

But don’t forget to add the library #include <stdlib.h> because atoi is declared there.

  • OK, so once you’ve got “k” stored as an int. You’ll need to ask the user for some plaintext. GetString() can help you with that.
  • You can iterate over the characters in a string, printing each one at the time, with code like below :
for (int i = 0, n = strlen(p); i < n ; i++)
{
     printf("%c", p[i]);
}

Incidentally, you’ll want to add another header file #include in order to use strlen http://www.cs50.net/resources/cppreference.com/stdstring/.

Here you go with the pseudocode !

Pseudocode :

  1. Get the key
  2. Get the plaintext
  3. Encipher
  4. Print Ciphertext

If you’re done with that. So we can automate some tests of your code, your program must behave per the below. Assumed that the underline text is what some user has typed.

jharvard@appliance (~/Dropbox/pset2) : ./caesar 13
Be sure to drink your Ovaltine!
Or fher gb qevax lbhe Binygvar!

If you’d like to check the correctness of your program with check50, you may execute the below:

check50 2014/x/pset2/caesar caesar.c

Congrats! You’ve completed Caesar!

  1. Vigenere cipher.

This last cipher of this problem set was hardly secure. Be sure to watch Vigenere’s short http://cs50.tv/2012/fall/shorts/vigenere_cipher/vigenere_cipher-720p.mp4 there was more sophisticated algorithm out there. It’s French  http://en.wikipedia.org/wiki/Vigen%C3%A8re_cipher.

Vigenere’s cipher improves upon Caesar’s by encrypting messages using a sequence of keys, in other word, if “p” is a plaintext and “k” is a keyword, then each letter “ci” in the ciphertext “c” is computed as :

ci = (pi + kj) % 26

ci = i-th letter of ciphertext.

pi = i-th letter of plaintext.

kj = j-th letter of the key.

%26 = remainder after dividing by 26.

Note this cipher’s use of “kj” as opposed to just “k”. And recall that. If “k” is shorther than “p”, then the letter in “k” must reused cyclically as many times as it takes to encrypt “p”.

Your next challenge is to write in vigenere.c, a program that encrypts messages using Vigenere’s cipher.

Pseudocode :

Get the key

  • 2nd command line argument : argv[1]
  • Must be alphabetical : isalpha

Get plaintext

  • GetString()

Encipher

  • Advance the next letter in keyword only if the character is a plaintext letter
  • isalpha
  • Preserve case
  • isupper, islower
  • Keep track of :
  • Position in plaintext
  • Position in keyword

In two separate variables.

Print ciphertext

With that, you’ll complete Vigenere cipher. So we can automate some tests of your code, your program must behave per the below :

jharvard@appliance (~/Dropbox/pset2): ./vigenere bacon 
Meet me at the park at eleven am 
Negh zf av huf pcfx bt gzrwep oz

If you’d like to check the correctness of your program with check50, you may execute the below:

check50 2014/x/pset2/vigenere vigenere.c

And, viola! You’ve completed Caesar and Vigenere cipher! Congratulations!

This was Problem Set 2.

Problem Set 1 : Introducing C

The C program you write is called Source Code. A compiler takes C source code and translates that code into machine language. Computers are made up of nothing more than thousands of electrical switches that are either on or off. Therefor, computers must ultimately be given instructions in binary. The prefix bi means two And the two states of electricity are called binary states. It’s much easier to use a C compiler to convert your C programs into 1s and 0s that represent internal on and off switch setting than for you to do it yourself.

The Compiling process :

pseudocode
The pseudocode and source code are done by you. Then, the compiler takes the source code and give a value which is called an object code.
Example :
– Writing a program that says “hello”

Before start the programs, be sure that you have cs50 appliance in your computer. Open up your gedit and let’s do this.
Not sure where to begin? See the hints below.
a. Pseudocode

  • Start Program
  • Print “hello”
  • Quit

b. Compiling

  • Source code in C :

hello
c. Object Code
hello binary

By the way, how to compile your program? Great question!

You will want to call ‘make’. You can compile your program by typing :make hello
Then if you can see the below, it means your program is compiled.make hello2

Or, you will want to call ‘clang’ by typing :

clang

If you’re done with this, let’s run the program by typing : run hello

And, viola! “hello” is printed!

run hello 2


Now, let’s talk about Conditions and Boolean expression in C.
These points are very important to understand since the Problem Set 1 will give you some basics of programming in C which will more complex in the next psets ahead.

1. Conditions :

a. If Statement

conditon

As you can see above the program said that if you have “a number less than zero”, your program should print a message such as “You picked a negative number!” and if you picked a number greater than zero, your program won’t print the message. That means your “condition” is going perfectly.

b. If – else statement

condition2

c. If – else if – else statement

condition3

d. If – if statement

condition4

Can you guess how those conditions work ?

2. Boolean Expressions

boolean

Combining Boolean Expressions :

combine boolean

3. Loops

a. For Loops (initialization, condition, update)

for-loop
Prints “This was CS50″ five times –  How it works? First, initialize variable(s) in this case ” int i “. Then check the condition ” i < 5 ” , after that update variable(s) ” i++ ” then the code inside the body will execute ” prints This was CS50 ” five times. If false exit the loop.

b. Do- While Loop
do-whileloop

Reprompts until user enters a positive number – How it, works? First, we declare variable(s), then execute code in the body “Please enter a positive number” then check the condition “enter < 1” if it is true, the code inside the body will execute if false, the loop will exit.

c. While Loop
while-loop

Counts down from 15 to 0 – How it works? First, declare variable(s) “int count = 15” then check the condition “count >= 0” if it is true, it will execute code in the body. If false, exit the loop.

4. Numerical Variables

  1. Int
  2. Float
  3. Long long
  4. Double

Watch the section : https://www.youtube.com/watch?v=fUhiHTHtYa4


Now, time to tackle Problem Set 1 :
a. Mario
b. Greedy

Mario

mar

Write in a file called mario.c in your ~/Dropbox/pset1 directory. A program that recreates half-pyramid using hashes(#) for blocks.

1. Prompt the user for half-pyramid’s height, a non-negative integer no greater than 23. If the user fails to provide a non-negative integer no greater than 23, you should re-prompt for the same again. With the help of do-while loop.

int n;
 do
 {
      n=GetInt();
 }
 while(n is invalid);

2. Draw half pyramids, you will want to call printf and one or more for loop. As the pyramid is in right align, you will want to print spaces aswell. Find the pattern!

Example :
Height = 8;
First Row = 7 spaces , 2 hashes
Second Row = 6 spaces, 3 hashes
Third Row = 5 spaces, 4 hashes
………………
n row : how many spaces?
How many hashes?

mario

Pseudocode :
For every row :
– print spaces
– print hashes
– print new line
Using for loop?
Keep in mind that the structure of for loop is should be like :

 for(initialization; condition; update)
 {
     //code inside
 }
 //code outside
for(int i = 0; i<height; i++)
 {
     print spaces
     print hashes
     print new line
 }
 //code outside

How about using while – loop? This is computer science, so many ways to solve this problem set. Your while-loop should be like :

 initialization;
 while(condition)
 {
   // code inside
   update;
 }

You should compile and run your program. Then, you’ll complete Mario. We first check for user input, and identify the pattern for each row from zero to n minus one , we print the spaces and hashes according to the pattern. With that, you’ll have a pyramid!

Greedy

Read the spec first  ” http://d2o9nyf4hwsci4.cloudfront.net/2014/x/psets/1/pset1/pset1.html#_time_for_change “. And let’s tackle greedy!
1. Prompt the user input for amount of change
2. Always use largest coin possible
3. Keep track of coins used
4. Print the final amount of coins

First, prompt the user input :

  • The amount must make sense
    Keep in mind that the amount should be a non-negative number and larger than zero.
  • What values are accepted?
  • The input is a value in dollars.
  • Use floating point imprecision.
  • How to convert dollars to cents? Use round() function. In the terminal type : man round.

Pseudocode :

  1. Get amount in dollars
  2. While(quarters can be used)
    increase count
    amount decrease by a quarter
  3. While(dimes can be used)
    increase count
    amount decrease by a dime
  4. While(etc)
    ….etc
  5. Print number of coins used

Now, compile and run the program!
jharvard@appliance (~/Dropbox/pset1) : make greedygreedy
Congrats, You’re done greedy.c!

This was Problem Set 1.