void splitpath(const char *path, char *directory, char *filename, char *extension)
{
memset(directory, 0, strlen(directory));
memset(filename, 0, strlen(filename));
memset(extension, 0, strlen(extension));
int filename_start_index = 0;
int extension_found = 0;
const char *last_directory_separator = strrchr(path, '/');
if(last_directory_separator != NULL) {
filename_start_index = last_directory_separator - path + 1;
}
int i;
for(i = 0; i < strlen(path); i++){
char temp[] = { path[i], '\0' };
if(i >= filename_start_index) {
if(extension_found || path[i] == '.') {
extension_found = 1;
strcat(extension, temp);
}
else
strcat(filename, temp);
}
else {
strcat(directory, temp);
}
}
}
int main()
{
char directory[200], filename[200], extension[200]; /* I am aware about buffer overflow. This is just for testing */
splitpath("/usr/bin/some.txt", directory, filename, extension);
printf("Directory = %s\n", directory);
printf("FileName = %s\n", filename);
printf("Extension = %s\n", extension);
splitpath("some.txt", directory, filename, extension);
printf("\n");
printf("Directory = %s\n", directory);
printf("FileName = %s\n", filename);
printf("Extension = %s\n", extension);
splitpath("src/some.tar.gz", directory, filename, extension);
printf("\n");
printf("Directory = %s\n", directory);
printf("FileName = %s\n", filename);
printf("Extension = %s\n", extension);
return 0;
}
Refactorings
No refactoring yet !
heretic
July 4, 2010, July 04, 2010 00:37, permalink
First of all, your memset codeblock is... dangerous to say the least. There is no guarantee that strlen(char *) will return the entire length of an uninitialized character buffer. The first character may have a null terminator, or you could continue on in address space forever looking for a null terminator. At which point you'll end up writing zero's to your memory until you managed to either a) segfault or b) overwrite your code space.
Second, whenever working with char *'s in C code, it's usually customary to write functions that take the char * AND a "length" parameter. Never assume your arguments are properly null terminated.
Third, if you are working with related variables such as "directory", "filename", and "extension", you should *really* refactor your code to group those attributes into one logical struct. You never know when you may want to add other attributes (such as checksum, modified date, permission flags, etc etc).
In all honesty, I won't even begin to touch that code. Take these suggestions into consideration first. Then when you're ready, post something fun to refactor :P
Navaneeth
July 5, 2010, July 05, 2010 07:43, permalink
@heretic : Excellent suggestions! Here is the updated code. Following are the changes made.
1 - Moved the memset to the calling code. I believe this will be the correct place as caller is the best person to know about the length. "splitpath" method requires a valid null terminated strings to work. That is the contract.
2 - This method is an internal one and not part of the public interface. So I will make sure this method gets valid null terminated C strings always. I guess I don't have to pass length then.
3 - I understand your point of grouping directory, filename and extension into a logical struct. Created a structure "path_info" which has 3 char* representing directory, filename and extension. The splitpath method takes a pointer to this struct and writes data to the members. It is callers responsibility to initialize the struct and members.
Please let me know the suggestions.
struct path_info {
char *directory;
char *filename;
char *extension;
};
void splitpath(const char *path, struct path_info *p)
{
int filename_start_index = 0, extension_found = 0, path_len = 0;
path_len = strlen(path);
const char *last_directory_separator = strrchr(path, '/');
if(last_directory_separator != NULL) {
filename_start_index = last_directory_separator - path + 1;
}
int i;
for(i = 0; i < path_len; i++) {
char temp[] = { path[i], '\0' };
if(i >= filename_start_index) {
if(extension_found || path[i] == '.') {
extension_found = 1;
strcat(p->extension, temp);
}
else
strcat(p->filename, temp);
}
else {
strcat(p->directory, temp);
}
}
}
void clear_contents(struct path_info *p, int length)
{
memset(p->directory, 0, length);
memset(p->filename, 0, length);
memset(p->extension, 0, length);
}
int main()
{
char directory[200], filename[200], extension[200]; // I am aware about buffer overflow. This is just for testing
struct path_info p = { directory, filename, extension };
clear_contents(&p, 200);
splitpath("/usr/bin/some.txt", &p);
printf("Directory = %s\n", directory);
printf("FileName = %s\n", filename);
printf("Extension = %s\n", extension);
}
Ants
July 5, 2010, July 05, 2010 08:00, permalink
In general, I agree with heretic. He's got very good points and suggestions.
I do beg to differ on the second point about "whenever working with char *'s in C code, it usually customary to write functions that take the char * AND a "length" parameter." I think that custom primarily applies to write buffers to make sure that you don't write past the end of the buffer. A secondary good use of this custom is when dealing with untrusted data, but you know the size of the buffer that the untrusted data is in. Most other times, you are forced to assume the string arguments are null terminated, out of convention.
Consider the commonly used library routines strlen() and strrchr().
If your don't know the length of the buffer in the first place, how will you use strnlen() in place of strlen()? If you wanted to replace strlen(argv[0]) with strnlen(argv[0], max), what would you use as the max? You can't rely on 4096, PATH_MAX or FILE_MAX because the Unix standard doesn't define a maximum path length.
As for strrchr(), I couldn't find an equivalent C library function that takes a maximum length. So you are stuck assuming that the input string is null terminated. That or roll your own strrchr() that take a maximum length.
Now back to the code:
- I suggest that you rethink the extension collection logic. Consider what happens when you encounter the filename "Wakko.Yakko.and.Dot.mpg"?
- Consider a more efficient way of assembling the strings in your for loop. Using strcat() to append one character at a time will get really expensive fast if you consider that strcat() needs to scan the target string each time to find the end.
Navaneeth
July 5, 2010, July 05, 2010 08:36, permalink
@Ants - Thanks for the suggestions.
- If I encounter filename like "Wakko.Yakko.and.Dot.mpg", my logic will take "Wakko" as file name and rest everything as extension. I couldn't think about a better way to extract the extension out of it. Do you have any suggestion?
- I understand this. But my function is not called so frequently. So I guess this implementation should be fine. However, I am wondering how you would implement an optimized version?
Ants
July 5, 2010, July 05, 2010 11:25, permalink
The same way you found the last '/' using strrchr(), I'd use the same function to find the last '.' in the filename for the extension. Even if your function isn't call frequently consider what happens on the one time that it is called with a path that is 4096 characters long.
This the way, I'd implement SplitPath(). My optimization makes a copy of the original path, and chops it up into path, filename and extension parts. The PATHINFO struct points to the parts.
typedef struct
{
char * directory;
char * filename;
char * extension;
} PATHINFO;
void FreePathInfo(PATHINFO * pathinfo)
{
if (pathinfo)
free(pathinfo);
}
// Allocate a block of memory that holds the PATHINFO at the beginning
// followed by the actual path. An extra byte is allocated (+2 instead
// of just +1) to deal with shifting the extension to protect the
// leading '.'. This extra byte also doubles as the empty string, as
// well as a pad to keep from reading past the memory block.
//
// Returns a PATHINFO structure that should be freed using FreePathInfo().
PATHINFO * SplitPath(const char * pathSrc)
{
size_t length = strlen(pathSrc);
PATHINFO * pathinfo = (PATHINFO *) malloc(sizeof(PATHINFO) + length + 2);
if (pathinfo)
{
char * path = (char *) &pathinfo[1]; // copy of the path
char * theEnd = &path[length + 1]; // second null terminator
char * extension;
char * lastSep;
// Copy the original string and double null terminate it.
strcpy(path, pathSrc);
*theEnd = '\0';
pathinfo->directory = theEnd; // Assume no path
pathinfo->extension = theEnd; // Assume no extension
pathinfo->filename = path; // Assume filename only
lastSep = strrchr(path, '/');
if (lastSep)
{
pathinfo->directory = path; // Pick up the path
*lastSep++ = '\0'; // Truncate directory
pathinfo->filename = lastSep; // Pick up name after path
}
// Start at the second character of the filename to handle
// filenames that start with '.' like ".login".
// We don't overrun the buffer in the cases of an empty path
// or a path that looks like "/usr/bin/" because of the extra
// byte.
extension = strrchr(&pathinfo->filename[1], '.');
if (extension)
{
// Shift the extension over to protect the leading '.' since
// we need to truncate the filename.
memmove(extension + 1, extension, strlen(extension));
pathinfo->extension = extension + 1;
*extension = '\0'; // Truncate filename
}
}
return pathinfo;
}
Navaneeth
July 5, 2010, July 05, 2010 18:07, permalink
Excellent @Ants. Very clever use of pointers. Your post was extremely helpful and I learned a lot of stuff. But your code was not giving directory name with '/' at the end (/usr/bin/ was returning as /usr/bin). So I modified it and added memmove() for the directory befoe truncating the path. I got it working.
Other than that, all looks good to me. One minor issue is, it can't handle filename with multiple extensions. Something like, some.tar.gz will give filename as some.tar and extension as .gz. I am not really concerned about this issue, but just letting you know.
Thanks again for your effort. I really appreciate it.
Ants
July 6, 2010, July 06, 2010 10:32, permalink
You're welcome, @Navaneeth. Sorry, I'm stuck in the Windows world, and the convention here is to drop the trailing slash when talking about directories. I forgot about the *nix convention of retaining the the trailing slash.
As for the "multiple extensions", there will have to be some kind of heuristic to determine what is a valid extension versus what is not, and what files can go into what. Consider "this.is.an.example.of.a.java.log.class.zip" Where does the filename end, where does the extension begin? ".a", ".java", ".log", ".class", and ".zip" are all valid extensions. As far as I know a ".log" can't go into ".class", but a ".class" can go into a ".zip", so the extension should be ".class.zip". But what if the filename is "dinosaurs.stuck.in.a.tar.pit"? If you had StuffIt on your system, the extension would be ".a.tar.pit" (because a ".a" can go into a ".tar", which can be compressed by StuffIt into a ".pit" file), but if not, the there shouldn't be any extension at all since the ".pit" extension is kind of useless without something to make use of it.
Fu86
July 6, 2010, July 06, 2010 15:15, permalink
In my opinion, the extension for some.tar.gz is ".gz" because thats the semantic of the extension. The file some.tar will be compressed with gzip and gets the extension .gz. To make clear that some files or directories gets archived and compressed in one step, the .tgz extension will be used.
Maskit
July 29, 2010, July 29, 2010 16:45, permalink
it was very interesting to read refactormycode.com
I want to quote your post in my blog. It can?
And you et an account on Twitter?
nixen
August 2, 2010, August 02, 2010 14:39, permalink
I would like to exchange links with your site refactormycode.com
Is this possible?
refactormycode.com
April 26, 2011, April 26, 2011 18:36, permalink
Refactormycode.. He-he-he :)
Refactormycode.. He-he-he :)
NoxNeorysot
July 9, 2011, July 09, 2011 16:06, permalink
It is a pity, that now I can not express - it is compelled to leave. I will be released - I will necessarily express the opinion.
https://me.yahoo.com/a/7oSzhTkh2N7Q1QA_M1m0fztGR8V_
October 29, 2011, October 29, 2011 23:36, permalink
<a href="http://www.wishes7.com/">wish</a>
thanks for comments
https://me.yahoo.com/a/7oSzhTkh2N7Q1QA_M1m0fztGR8V_
October 29, 2011, October 29, 2011 23:36, permalink
<a href="http://www.wishes7.com/">wish</a>
thanks for comments
https://me.yahoo.com/a/7oSzhTkh2N7Q1QA_M1m0fztGR8V_
October 29, 2011, October 29, 2011 23:37, permalink
<a href="http://www.wishes7.com/">wish</a>
thanks for comments
<a href="http://www.wishes7.com/">wish</a> thanks for comment
I am new to C. Following code extracts directory path, filename and extension. I am not sure this is in a good shape. All I know is this code works. Any suggestions that can improve the code are welcome!