Organize, Search, and Back Up Files with Python’s PathlibOrganize, Search, and Back Up Files with Python’s Pathlib
Image by Author

 

Python’s built-in pathlib module makes working with filesystem paths super simple. In How To Navigate the Filesystem with Python’s Pathlib, we looked at the basics of working with path objects and navigating the filesystem. It’s time to go further.

In this tutorial, we’ll go over three specific file management tasks using the capabilities of the pathlib module:

  • Organizing files by extension
  • Searching for specific files
  • Backing up important files

By the end of this tutorial, you’ll have learned how to use pathlib for file management tasks. Let’s get started!

 

1. Organize Files by Extension

 

When you’re researching for and working on a project, you’ll often create ad hoc files and download related documents into your working directory until it’s a clutter, and you need to organize it.

Let’s take a simple example where the project directory contains requirements.txt, config files and Python scripts. We’d like to sort the files into subdirectories—one for each extension. For convenience, let’s choose the extensions as the name of the subdirectories.

 

organize-filesorganize-files
Organize Files by Extension | Image by Author

 

Here’s a Python script that scans a directory, identifies files by their extensions, and moves them into respective subdirectories:

# organize.py

from pathlib import Path

def organize_files_by_extension(path_to_dir):
    path = Path(path_to_dir).expanduser().resolve()
    print(f"Resolved path: {path}")

    if path.exists() and path.is_dir():
        print(f"The directory {path} exists. Proceeding with file organization...")
   	 
    for item in path.iterdir():
        print(f"Found item: {item}")
        if item.is_file():
            extension = item.suffix.lower()
            target_dir = path / extension[1:]  # Remove the leading dot

            # Ensure the target directory exists
            target_dir.mkdir(exist_ok=True)
            new_path = target_dir / item.name

            # Move the file
            item.rename(new_path)

            # Check if the file has been moved
            if new_path.exists():
                print(f"Successfully moved {item} to {new_path}")
            else:
                print(f"Failed to move {item} to {new_path}")

	  else:
       print(f"Error: {path} does not exist or is not a directory.")

organize_files_by_extension('new_project')

 

The organize_files_by_extension() function takes a directory path as input, resolves it to an absolute path, and organizes the files within that directory by their file extensions. It first ensures that the specified path exists and is a directory.

Then, it iterates over all items in the directory. For each file, it retrieves the file extension, creates a new directory named after the extension (if it doesn’t already exist), and moves the file into this new directory.

After moving each file, it confirms the success of the operation by checking the existence of the file in the new location. If the specified path does not exist or is not a directory, it prints an error message.

Here’s the output for the example function call (organizing files in the new_project directory):

 
organizeorganize
 

Now try this on a project directory in your working environment. I’ve used if-else to account for errors. But you can as well use try-except blocks to make this version better.

 

2. Search for Specific Files

 

Sometimes you may not want to organize the files by their extension into different subdirectories as with the previous example. But you may only want to find all files with a specific extension (like all image files), and for this you can use globbing.

Say we want to find the requirements.txt file to look at the project’s dependencies. Let’s use the same example but after grouping the files into subdirectories by the extension.

If you use the glob() method on the path object as shown to find all text files (defined by the pattern ‘*.txt’), you’ll see that it doesn’t find the text file:

# search.py
from pathlib import Path

def search_and_process_text_files(directory):
    path = Path(directory)
    path = path.resolve()
    for text_file in path.glob('*.txt'):
    # process text files as needed
        print(f'Processing {text_file}...')
        print(text_file.read_text())

search_and_process_text_files('new_project')

 

This is because glob() only searches the current directory, which does not contain the requirements.txt file.The requirements.txt file is in the txt subdirectory. So you have to use recursive globbing with the rglob() method instead.

So here’s the code to find the text files and print out their contents:

from pathlib import Path

def search_and_process_text_files(directory):
    path = Path(directory)
    path = path.resolve()
    for text_file in path.rglob('*.txt'):
    # process text files as needed
        print(f'Processing {text_file}...')
        print(text_file.read_text())

search_and_process_text_files('new_project')

 

The search_and_process_text_files function takes a directory path as input, resolves it to an absolute path, and searches for all .txt files within that directory and its subdirectories using the rglob() method.

For each text file found, it prints the file’s path and then reads and prints out the file’s contents. This function is useful for recursively locating and processing all text files within a specified directory.

Because requirements.txt is the only text file in our example, we get the following output:

Output >>>
Processing /home/balapriya/new_project/txt/requirements.txt...
psycopg2==2.9.0
scikit-learn==1.5.0

 

Now that you know how to use globbing and recursive globbing, try to redo the first task—organizing files by extension—using globbing to find and group the files and then move them to the target subdirectory.

 

3. Back Up Important Files

 

Organizing files by the extension and searching for specific files are the examples we’ve seen thus far. But how about backing up certain important files, because why not?

Here we’d like to copy files from the project directory into a backup directory rather than move the file to another location. In addition to pathlib, we’ll also use the shutil module’s copy function.

Let’s create a function that copies all files with a specific extension (all .py files) to a backup directory:

#back_up.py
import shutil
from pathlib import Path

def back_up_files(directory, backup_directory):
    path = Path(directory)
    backup_path = Path(backup_directory)
    backup_path.mkdir(parents=True, exist_ok=True)

    for important_file in path.rglob('*.py'):
        shutil.copy(important_file, backup_path / important_file.name)
        print(f'Backed up {important_file} to {backup_path}')


back_up_files('new_project', 'backup')

 

The back_up_files() takes in an existing directory path and a backup directory path function and backs up all Python files from a specified directory and its subdirectories into a designated backup directory.

It creates path objects for both the source directory and the backup directory, and ensures that the backup directory exists by creating it and any necessary parent directories if they do not already exist.

The function then iterates through all .py files in the source directory using the rglob() method. For each Python file found, it copies the file to the backup directory while retaining the original filename. Essentially, this function helps in creating a backup of all Python files within a project directory

After running the script and verifying the output, you can always check the contents of the backup directory:

 
backupbackup
 

For your example directory, you can use back_up_files('/path/to/directory', '/path/to/backup/directory') to back up files of interest.

 

Wrapping Up

 

In this tutorial, we’ve explored practical examples of using Python’s pathlib module to organize files by extension, search for specific files, and backup important files. You can find all the code used in this tutorial on GitHub.

As you can see, the pathlib module makes working with file paths and file management tasks easier and more efficient. Now, go ahead and apply these concepts in your own projects to handle your file management tasks better. Happy coding!

 

 

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.





Source link

Shares:
Leave a Reply

Your email address will not be published. Required fields are marked *