D41d8cd98f00b204e9800998ecf8427e

I am new to C. Following code extracts directory path, filename and extension. I am not sure this is in a good shape. All I know is this code works. Any suggestions that can improve the code are welcome!

void splitpath(const char *path, char *directory, char *filename, char *extension)
{
    memset(directory, 0, strlen(directory));
    memset(filename, 0, strlen(filename));
    memset(extension, 0, strlen(extension));

    int filename_start_index = 0;
    int extension_found = 0;
    const char *last_directory_separator = strrchr(path, '/');
    if(last_directory_separator != NULL) {
        filename_start_index = last_directory_separator - path + 1;
    }
    int i;
    for(i = 0; i < strlen(path); i++){
        char temp[] = { path[i], '\0' };
        if(i >= filename_start_index) {
            if(extension_found || path[i] == '.') {
                extension_found = 1;
                strcat(extension, temp);
            }
            else
                strcat(filename, temp);
        }
        else {
            strcat(directory, temp);
        }
    }

}
int main()
{
    char directory[200], filename[200], extension[200];  /* I am aware about buffer overflow. This is just for testing */
    splitpath("/usr/bin/some.txt", directory, filename, extension);

    printf("Directory = %s\n", directory);
    printf("FileName = %s\n", filename);
    printf("Extension = %s\n", extension);

    splitpath("some.txt", directory, filename, extension);
    printf("\n");
    printf("Directory = %s\n", directory);
    printf("FileName = %s\n", filename);
    printf("Extension = %s\n", extension);

    splitpath("src/some.tar.gz", directory, filename, extension);
    printf("\n");
    printf("Directory = %s\n", directory);
    printf("FileName = %s\n", filename);
    printf("Extension = %s\n", extension);
    return 0;
}

Refactorings

No refactoring yet !

B04fb5484756984ac7875f37e5cb3320

heretic

July 4, 2010, July 04, 2010 00:37, permalink

2 ratings. Login to rate!

First of all, your memset codeblock is... dangerous to say the least. There is no guarantee that strlen(char *) will return the entire length of an uninitialized character buffer. The first character may have a null terminator, or you could continue on in address space forever looking for a null terminator. At which point you'll end up writing zero's to your memory until you managed to either a) segfault or b) overwrite your code space.

Second, whenever working with char *'s in C code, it's usually customary to write functions that take the char * AND a "length" parameter. Never assume your arguments are properly null terminated.

Third, if you are working with related variables such as "directory", "filename", and "extension", you should *really* refactor your code to group those attributes into one logical struct. You never know when you may want to add other attributes (such as checksum, modified date, permission flags, etc etc).

In all honesty, I won't even begin to touch that code. Take these suggestions into consideration first. Then when you're ready, post something fun to refactor :P

D41d8cd98f00b204e9800998ecf8427e

Navaneeth

July 5, 2010, July 05, 2010 07:43, permalink

No rating. Login to rate!

@heretic : Excellent suggestions! Here is the updated code. Following are the changes made.

1 - Moved the memset to the calling code. I believe this will be the correct place as caller is the best person to know about the length. "splitpath" method requires a valid null terminated strings to work. That is the contract.

2 - This method is an internal one and not part of the public interface. So I will make sure this method gets valid null terminated C strings always. I guess I don't have to pass length then.

3 - I understand your point of grouping directory, filename and extension into a logical struct. Created a structure "path_info" which has 3 char* representing directory, filename and extension. The splitpath method takes a pointer to this struct and writes data to the members. It is callers responsibility to initialize the struct and members.

Please let me know the suggestions.

struct path_info {
    char *directory;
    char *filename;
    char *extension;
};
void splitpath(const char *path, struct path_info *p)
{   
    int filename_start_index = 0, extension_found = 0, path_len = 0;
    path_len = strlen(path);

    const char *last_directory_separator = strrchr(path, '/');
    if(last_directory_separator != NULL) {
        filename_start_index = last_directory_separator - path + 1;
    }
    int i;    
    for(i = 0; i < path_len; i++) {
        char temp[] = { path[i], '\0' };
        if(i >= filename_start_index) {
            if(extension_found || path[i] == '.') {
                extension_found = 1;
                strcat(p->extension, temp);
            }
            else
                strcat(p->filename, temp);
        }
        else {
            strcat(p->directory, temp);
        }
    }
}
void clear_contents(struct path_info *p, int length)
{
    memset(p->directory, 0, length);
    memset(p->filename, 0, length);
    memset(p->extension, 0, length);
}

int main()
{
    char directory[200], filename[200], extension[200];  // I am aware about buffer overflow. This is just for testing  
    struct path_info p = { directory, filename, extension };

    clear_contents(&p, 200);
    splitpath("/usr/bin/some.txt", &p);

    printf("Directory = %s\n", directory);
    printf("FileName = %s\n", filename);
    printf("Extension = %s\n", extension);
}
F9a9ba6663645458aa8630157ed5e71e

Ants

July 5, 2010, July 05, 2010 08:00, permalink

No rating. Login to rate!

In general, I agree with heretic. He's got very good points and suggestions.

I do beg to differ on the second point about "whenever working with char *'s in C code, it usually customary to write functions that take the char * AND a "length" parameter." I think that custom primarily applies to write buffers to make sure that you don't write past the end of the buffer. A secondary good use of this custom is when dealing with untrusted data, but you know the size of the buffer that the untrusted data is in. Most other times, you are forced to assume the string arguments are null terminated, out of convention.

Consider the commonly used library routines strlen() and strrchr().

If your don't know the length of the buffer in the first place, how will you use strnlen() in place of strlen()? If you wanted to replace strlen(argv[0]) with strnlen(argv[0], max), what would you use as the max? You can't rely on 4096, PATH_MAX or FILE_MAX because the Unix standard doesn't define a maximum path length.

As for strrchr(), I couldn't find an equivalent C library function that takes a maximum length. So you are stuck assuming that the input string is null terminated. That or roll your own strrchr() that take a maximum length.

Now back to the code:
- I suggest that you rethink the extension collection logic. Consider what happens when you encounter the filename "Wakko.Yakko.and.Dot.mpg"?
- Consider a more efficient way of assembling the strings in your for loop. Using strcat() to append one character at a time will get really expensive fast if you consider that strcat() needs to scan the target string each time to find the end.

D41d8cd98f00b204e9800998ecf8427e

Navaneeth

July 5, 2010, July 05, 2010 08:36, permalink

No rating. Login to rate!

@Ants - Thanks for the suggestions.

- If I encounter filename like "Wakko.Yakko.and.Dot.mpg", my logic will take "Wakko" as file name and rest everything as extension. I couldn't think about a better way to extract the extension out of it. Do you have any suggestion?

- I understand this. But my function is not called so frequently. So I guess this implementation should be fine. However, I am wondering how you would implement an optimized version?

F9a9ba6663645458aa8630157ed5e71e

Ants

July 5, 2010, July 05, 2010 11:25, permalink

2 ratings. Login to rate!

The same way you found the last '/' using strrchr(), I'd use the same function to find the last '.' in the filename for the extension. Even if your function isn't call frequently consider what happens on the one time that it is called with a path that is 4096 characters long.

This the way, I'd implement SplitPath(). My optimization makes a copy of the original path, and chops it up into path, filename and extension parts. The PATHINFO struct points to the parts.

typedef struct
{
    char * directory;
    char * filename;
    char * extension;
} PATHINFO;

void FreePathInfo(PATHINFO * pathinfo)
{
    if (pathinfo)
        free(pathinfo);
}

// Allocate a block of memory that holds the PATHINFO at the beginning
// followed by the actual path. An extra byte is allocated (+2 instead
// of just +1) to deal with shifting the extension to protect the
// leading '.'. This extra byte also doubles as the empty string, as
// well as a pad to keep from reading past the memory block.
//
// Returns a PATHINFO structure that should be freed using FreePathInfo().
PATHINFO * SplitPath(const char * pathSrc)
{
    size_t length = strlen(pathSrc);
    PATHINFO * pathinfo = (PATHINFO *) malloc(sizeof(PATHINFO) + length + 2);

    if (pathinfo)
    {
        char * path = (char *) &pathinfo[1];    // copy of the path
        char * theEnd = &path[length + 1];      // second null terminator
        char * extension;
        char * lastSep;

        // Copy the original string and double null terminate it.
        strcpy(path, pathSrc);
        *theEnd = '\0';
        pathinfo->directory = theEnd;   // Assume no path
        pathinfo->extension = theEnd;   // Assume no extension
        pathinfo->filename = path;      // Assume filename only

        lastSep = strrchr(path, '/');
        if (lastSep)
        {
            pathinfo->directory = path;     // Pick up the path
            *lastSep++ = '\0';              // Truncate directory
            pathinfo->filename = lastSep;   // Pick up name after path  
        }
    
        // Start at the second character of the filename to handle
        // filenames that start with '.' like ".login".
        // We don't overrun the buffer in the cases of an empty path
        // or a path that looks like "/usr/bin/" because of the extra
        // byte.
        extension = strrchr(&pathinfo->filename[1], '.');
        if (extension)
        {
            // Shift the extension over to protect the leading '.' since
            // we need to truncate the filename.
            memmove(extension + 1, extension, strlen(extension));
            pathinfo->extension = extension + 1;

            *extension = '\0';          // Truncate filename
        }
    }

    return pathinfo;
}
D41d8cd98f00b204e9800998ecf8427e

Navaneeth

July 5, 2010, July 05, 2010 18:07, permalink

No rating. Login to rate!

Excellent @Ants. Very clever use of pointers. Your post was extremely helpful and I learned a lot of stuff. But your code was not giving directory name with '/' at the end (/usr/bin/ was returning as /usr/bin). So I modified it and added memmove() for the directory befoe truncating the path. I got it working.

Other than that, all looks good to me. One minor issue is, it can't handle filename with multiple extensions. Something like, some.tar.gz will give filename as some.tar and extension as .gz. I am not really concerned about this issue, but just letting you know.

Thanks again for your effort. I really appreciate it.

F9a9ba6663645458aa8630157ed5e71e

Ants

July 6, 2010, July 06, 2010 10:32, permalink

No rating. Login to rate!

You're welcome, @Navaneeth. Sorry, I'm stuck in the Windows world, and the convention here is to drop the trailing slash when talking about directories. I forgot about the *nix convention of retaining the the trailing slash.

As for the "multiple extensions", there will have to be some kind of heuristic to determine what is a valid extension versus what is not, and what files can go into what. Consider "this.is.an.example.of.a.java.log.class.zip" Where does the filename end, where does the extension begin? ".a", ".java", ".log", ".class", and ".zip" are all valid extensions. As far as I know a ".log" can't go into ".class", but a ".class" can go into a ".zip", so the extension should be ".class.zip". But what if the filename is "dinosaurs.stuck.in.a.tar.pit"? If you had StuffIt on your system, the extension would be ".a.tar.pit" (because a ".a" can go into a ".tar", which can be compressed by StuffIt into a ".pit" file), but if not, the there shouldn't be any extension at all since the ".pit" extension is kind of useless without something to make use of it.

Ee0505bbd355292778077fb662c88f13

Fu86

July 6, 2010, July 06, 2010 15:15, permalink

No rating. Login to rate!

In my opinion, the extension for some.tar.gz is ".gz" because thats the semantic of the extension. The file some.tar will be compressed with gzip and gets the extension .gz. To make clear that some files or directories gets archived and compressed in one step, the .tgz extension will be used.

928b47489462b9301d8e7026d9b074cf

Maskit

July 29, 2010, July 29, 2010 16:45, permalink

No rating. Login to rate!

it was very interesting to read refactormycode.com
I want to quote your post in my blog. It can?
And you et an account on Twitter?

928b47489462b9301d8e7026d9b074cf

nixen

August 2, 2010, August 02, 2010 14:39, permalink

No rating. Login to rate!

I would like to exchange links with your site refactormycode.com
Is this possible?

06b1a4f425b49ad84c2859ab9c2605cd

refactormycode.com

April 26, 2011, April 26, 2011 18:36, permalink

No rating. Login to rate!

Refactormycode.. He-he-he :)

Refactormycode.. He-he-he :)
D2b535e989a09137f448b60efb2fa25a

NoxNeorysot

July 9, 2011, July 09, 2011 16:06, permalink

No rating. Login to rate!

It is a pity, that now I can not express - it is compelled to leave. I will be released - I will necessarily express the opinion.

55502f40dc8b7c769880b10874abc9d0

https://me.yahoo.com/a/7oSzhTkh2N7Q1QA_M1m0fztGR8V_

October 29, 2011, October 29, 2011 23:36, permalink

No rating. Login to rate!

<a href="http://www.wishes7.com/">wish</a>
thanks for comments

55502f40dc8b7c769880b10874abc9d0

https://me.yahoo.com/a/7oSzhTkh2N7Q1QA_M1m0fztGR8V_

October 29, 2011, October 29, 2011 23:36, permalink

No rating. Login to rate!

<a href="http://www.wishes7.com/">wish</a>
thanks for comments

55502f40dc8b7c769880b10874abc9d0

https://me.yahoo.com/a/7oSzhTkh2N7Q1QA_M1m0fztGR8V_

October 29, 2011, October 29, 2011 23:37, permalink

No rating. Login to rate!

<a href="http://www.wishes7.com/">wish</a>
thanks for comments

<a href="http://www.wishes7.com/">wish</a>
thanks for comment

Your refactoring





Format Copy from initial code

or Cancel