On 10/26/2016 02:06 PM, Wish Dokta wrote: > Hello, > > I am currently writing a basic program to calculate and display the size of > folders with a drive/directory. To do this I am storing each directory in a > dict as the key, with the value being the sum of the size of all files in > that directories (but not directories). > > For example: > > { "C:\\docs" : 10, "C:\\docs123" : 200, "C:\\docs\\code\\snippets" : 5, > "C:\\docs\\code" : 20, "C:\\docs\\pics" : 200, "C:\\docs\\code\\python" : > 10 } > > Then to return the total size of a directory I am searching for a string in > the key: > > For example: > > for "C:\\docs\\code" in key: > > Which works fine and will return "C:\\docs\\code" : 20, > "C:\\docs\\code\\snippets" : 5, "C:\\docs\\code\\python" : 10 = (35) > > However it fails when I try to calculate the size of a directory such as > "C:\\docs", as it also returns "C:\\docs123". > > I'd be very grateful if anyone could offer any advice on how to correct > this.
Hello- As you saw in your current approach, using strings for paths can be problematic in a lot of scenarios. I've found it really useful to use a higher-level abstraction instead, like what is provided by pathlib in the standard library. You're obviously using Windows, and you didn't mention your Python version, so I'll assume you're using something current like 3.5.2 (at least 3.4 is required for the following code). You could do something like the following: """ from pathlib import PureWindowsPath # From your example sizes_str_keys = { "C:\\docs": 10, "C:\\docs123": 200, "C:\\docs\\code\\snippets": 5, "C:\\docs\\code": 20, "C:\\docs\\pics": 200, "C:\\docs\\code\\python": 10, } # Same dict, but with Path objects as keys, and the same sizes as values. # You would almost definitely want to use Path in your code (and adjust # the 'pathlib' import appropriately), but I'm on a Linux system so I had # to use a PureWindowsPath instead. sizes_path_keys = {PureWindowsPath(p): s for (p, s) in sizes_str_keys.items()} def filter_paths(size_dict, top_level_directory): for path in size_dict: # Given some directory we're examining (e.g. c:\docs\code\snippets), # and top-level directory (e.g. c:\docs), we want to yield this # directory if it exactly matches (of course) or if the top-level # directory is a parent of what we're looking at: # >>> pprint(list(PureWindowsPath("C:\\docs\\code\\snippets").parents)) # [PureWindowsPath('C:/docs/code'), # PureWindowsPath('C:/docs'), # PureWindowsPath('C:/')] # so in that case we'll find 'c:\docs' in iterating over path.parents. # You'll definitely want to remove the 'print' calls too: if path == top_level_directory or top_level_directory in path.parents: print('Matched', path) yield path else: print('No match for', path) def compute_subdir_size(size_dict, top_level_directory): total_size = 0 for dir_key in filter_paths(size_dict, top_level_directory): total_size += size_dict[dir_key] return total_size """ Then you could call 'compute_subdir_size' like so: """ >>> compute_subdir_size(sizes_path_keys, PureWindowsPath(r'c:\docs')) Matched C:\docs\code\snippets No match for C:\docs123 Matched C:\docs\code\python Matched C:\docs\pics Matched C:\docs\code Matched C:\docs 245 >>> compute_subdir_size(sizes_path_keys, PureWindowsPath(r'c:\docs\code')) Matched C:\docs\code\snippets No match for C:\docs123 Matched C:\docs\code\python No match for C:\docs\pics Matched C:\docs\code No match for C:\docs 35 """ MMR... _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor