Category: Adventure On

  • Crash Course in Technical Support

    AI assisted coding like Claude or Lovable make it easy for anyone to create a web app. This type of enablement means that projects can go from ideas to launch really quickly and for, well, anyone.

    But what happens when you get your first users? Figuring out how to support them can take a while.

    I have more than 10 years of experience in supporting customers to help them figure out how to use SaaS products, or engineers in how to extend them.

    Here’s some tips for anyone who is trying to learn support on the fly:

    Tip #1: Take A Breath

    In support, everything feels urgent. Very few people write to support and say, “I’m having trouble, but get back to me whenever you can!”. Rather, messages are, “Hey! It’s broken! My work is blocked! Fix it! Fix it now!”

    It is easy to let their panic generate panic in you. Taking a deep breath, reading through their message multiple times, and having a defined support process will help to support users better. I was once given the advice, “Go slow to go fast”. By slowing down and paying attention, you’ll get a better sense of the issue and make fewer mistakes. Taking the time to really understand the where the user is stuck will fix their problem faster than assuming.

    Tip #2: Respond In A Timely Manner

    No one likes to feel ignored or dismissed. Responding to a user message within 24 hours, or one business day at the most, will let the user know you received their issue.

    Don’t fall into the trap of thinking that an issue has to be resolved before responding to a user. If the issue can be resolved under an hour, then it is worth it to simply fix the thing and send a message afterwards. However, if resolving the issue will take more than an hour, it is better to respond to the user to let them know you received their message, any ideas you have about the cause of the problem, and that you are working on the fix. Once the fix is done, follow up with them to let them know.

    Always execute follow-ups if you promise one. Don’t break the user’s trust by promising an update in 24 hours and then not sending it. Even if the update is “I’m still working on it”.

    Tip #3: Get More Info

    It seems like human nature to share problems rather than describe them. Think of when you taste something and it isn’t quite right. What’s the first thing you do? Shove the spoon in the face of the person next to you, demanding they taste it too. You don’t tell them it is too salty, or too spicy, or too…something. You just share that something is off.

    Often, user’s do the same thing. They write an email and let support know that something is wrong, but they may not think to describe what the problem really is, or how they got there. It may help to create a standard set of questions if it is unclear what the issue is from the first message.

    Some examples:

    • What browser are you using?
    • What type of computer or device are you using?
    • What happened before you got here?
    • Are there any error messages?
    • Can you send a screenshot?
    • Did this work before?

    Tip #4: Clarify Expectations

    What a user expects to happen is valuable information. Once you understand what the user expects, you can clarify if something is actually broken or if the user expects things work to differently than the app was designed.

    Expectations provide actionable feedback to either fix the issue or to consider an enhancement. Remember, fixing everything immediately may not be the right decision. For example, if a fix requires a complete refactor of the code but only has a small number of users who are affected, it may not be worth it. Especially if there is a work around users can be directed to. On the other hand, if a fix is relatively simple, its almost always a good idea to implement it.

    Tip #5: Replicate The Problem

    If the user reports something is broken, confirm that you can replicate the issue. Use a test account (or even better, a testing environment) to follow the actions the user took and see if the same result can be consistently generated. By testing in multiple scenarios, the scope of the bug or issue can be narrowed down.

    Tip #6: Keep Notes On Every Interaction

    When support requests come in, it is easy to get overwhelmed. Notes will help prevent this. Rather than having to go back and re-read every interaction with a user, notes will act as guide posts for what has been done and what needs to be done next.

    Here is a format that I use:

    User: User Name
    Email: user@userdomain.com
    Account/Website/Identifying Information: xxxx-xxxx-xxxx
    Issue: Customer got stuck trying to create a new widget
    Done: Provided instructions on how to create the widget. Provided link to documentation.
    Next: Nothing at the moment. All set.

    User: User Name
    Email: user@userdomain.com
    Account/Website/Identifying Information: xxxx-xxxx-xxxx
    Issue: Customer is seeing an error message when logging in.
    Done: Checked their login information and account. Was able to replicate the issue. Let the customer know I would investigate.
    Next: Investigate the issue and get back to customer in 24-hours with update.

    Tip #7: Track Everything

    If you receive feedback from a user about something being broken, or a feature request, you should keep note of it. This way, if more users contact you about the same (or similar enough) issues, you have data to impact your next steps.

    When something new launches, it is easy to let the urgency and excitement make it feel like every bug needs to be fixed or feature request implemented. However, that’s an easy way to overcomplicate things or dilute the original value of the app. The app isn’t meant to make everyone happy. It is meant to provide a service. Keep your focus on the original purpose and filter out noise as needed. Listening to users is really important. Being mobbed by them is a distraction.

    Tip #8: Improve Documentation

    If users keep submitting the same question over and over again, then it may not be an issue with the app but it might be with the documentation. Public documentation which is up-to-date and easy to understand is a key way to support users who are learning how to use your product.

    Tip #9: Define processes

    As you start figuring out how to help users, start writing standard operating procedures and processes for commonly requested issues. For example, when and how do you give a customer a refund? If there is a bug that needs help from engineering, how do you escalate that? If an account needs to be transferred to a new user, do you allow that? Deciding these types of issues and writing down the process will give a more consistent support experience to your users.

    This can also extend to creating canned responses which can be sent to common questions. This speeds up the process of responding to messages.

    Tip #10: Did I miss anything?

    If you feel there was something I missed, or have a pro-support tip, then please feel free to submit it in the comments!

  • Lovable to Local with Supabase

    I have a client who has created a really neat app via Lovable. They needed help moving from what they have to launch.

    One of the first concerns I had was how to set up a development environment for the project outside of Lovable.

    tl;dr Creating a local environment for a Lovable app required cloning the GitHub repo and then setting up a local Supabase install by seeding data from the remote database.

    Note: This tutorial is for Lovable projects which already have Supabase and GitHub integrated with the app. If your Lovable project isn’t there yet, here is the documentation for integrating Supabase and GitHub.

    Before you get started

    There were a few gotchas which I ran into. I deal with them throughout the tutorial, so you can skip ahead if you like. If you want a heads up of what to watch out for, continue here.

    Self-hosted via Docker VS Supabase CLI

    This feels a bit like the struggle of self-hosted WordPress vs WordPress.com. Like WordPress, Supabase is both an OpenSource project which can be hosted on your servers via Docker. It is also an enterprise hosting service which will host the databases for you. The important distinction for our tutorial is that in order to work on a Supabase project locally, you do need to have Docker Desktop installed and configured to work but you do not need to go through the trouble of setting up the Docker environment. It will send you down a rabbit trail. Rather, install the Supabase CLI and initialize the project from there, as described in Steps 3-5.

    The Remote Database Password is Required during First Connections

    This may seem like a “well-duh”, but I tried my hardest to work around it. Mostly because my client wasn’t sure what the password was and I didn’t want to force them into resetting the password, as I wasn’t sure how that would affect the Lovable app. This password isn’t tied to a user account, but the database itself. I tried just downloading a copy of the back up, and then restoring it locally, but that turned out to be its own headache. The native tools built into the Supabase CLI make accessing the remote database much quicker if you have the password. My client and I ended up resetting the password. Luckily, changing the password didn’t seem to have an effect on the Lovable app and I didn’t have to hunt around for places where the password was used.

    Step 1: Install Supabase CLI

    If you already have Supabase CLI installed, then you can skip ahead.

    If not, the documentation for installing the CLI can be found here: https://supabase.com/docs/guides/local-development/cli/getting-started

    Click on the environment you are using to install the CLI and it will provide the instructions.

    In my case (macOS), I used Homebrew via command line in Terminal.

    brew install supabase/tap/supabase
    

    Step 2: Clone the Lovable GitHub repo

    $ git clone <Your Repo URL>
    

    Step 3: Initialize Local Supabase

    Access the folder where you cloned the Lovable project from GitHub. There should already be a supabase directory. It may contain sub-directories like functions and migrations.

    % cd <Your Repo Name>
    

    You probably don’t want to commit any of the local configuration files to the repo which you share with Lovable. Add the following items to your .gitignore file:

    # Ignore Supabase local Development files
    supabase/seed.sql
    supabase/.branches/
    supabase/.temp/
    supabase/functions/.env
    supabase/config.toml
    

    Important: Make sure to save and commit the change to the .gitignore file before running supabase init. This way, Git won’t be trying to track the files Supabase will create.

    Next, run the Supabase initialization:

    <Your Repo Name>% supabase init
    

    Now that a local Supabase project has been initialized, the database can be connected to your remote project.

    To log into your Supabase account where the remote project is:

    <Your Repo Name>% supabase login
    

    The response will be a link to open in a browser.

    Hello from Supabase! Press Enter to open browser and login automatically.
    
    Here is your login link in case browser did not open https://supabase.com/dashboard/cli/login?session_id={...}
    
    

    The link will either prompt you to log in to your Supabase account or immediately redirect you to the authorization page. Once logged in, the authorization page will show a verification code. Copy this into your terminal.

    Enter your verification code: {AUTHCODE}
    
    Token cli_jessboctor@{machinename}.local_{...} created successfully.
    
    You are now logged in. Happy coding!
    

    Once you have logged in, you can link the local instance to a remote project:

    <Your Repo Name>% supabase link
    

    The response will be a list of the projects in your remote Supabase workspace. Use the ⬆️ and ⬇️ keys to highlight the project you want and press enter.

    Here, you may be asked for the database password for this project. This is not your user account password. You should have set the password when you created the Supabase project. If the password if valid, the supabase/config.toml file will be updated to reflect the project which you linked to.

    Step 5: Seed & Start the Database

    In order to set up with pre-populated data, you need to download a seed file from Supabase.

    <Your Repo Name>% supabase db dump --data-only > supabase/seed.sql
    
    

    This command will download a seed SQL file to your-repo-name/supabase/seed.sql

    To start the database:

    <Your Repo Name>% supabase start
    

    Once the servers have started, you should see a list of local URLs and ports which you can use for different things. For example, the “Studio” allows you to access a local version of the Supabase dashboard.

    Step 5: Update the environment keys

    Now that the local Supabase tables are running, we need to connect the app to the local tables rather than the remote ones.

    Open the Supabase Studio URL in a browser (it is most likely http://127.0.0.1:54323). When you see the Supabase dashboard, click on the “Connect” button in the upper right-hand corner.

    When the connect dialogue opens, click on the “App Frameworks” button. You need to copy the “{…}_Supabase URL” and “{…}_Supabase_Anon_Key” from the dialogue,

    In your code editor, open up the your-repo-name/integrations/supabase/client.ts file. We need to replace the Supabase_URL and Supabase_Publishable_Key values. The file should look like this:

    // This file is automatically generated. Do not edit it directly.
    import { createClient } from '@supabase/supabase-js';
    import type { Database } from './types';
    
    SUPABASE_URL = "<Your Remote Subabase URL>";
    SUPABASE_PUBLISHABLE_KEY = "<A really long string>";
    
    // Import the supabase client like this:
    // import { supabase } from "@/integrations/supabase/client";
    
    export const supabase = createClient<Database>(SUPABASE_URL, SUPABASE_PUBLISHABLE_KEY);
    
    

    You will want to replace the values with the ones you copied from the local Supabase Studio Connect:

    // This file is automatically generated. Do not edit it directly.
    import { createClient } from '@supabase/supabase-js';
    import type { Database } from './types';
    
    SUPABASE_URL = "http://127.0.0.1:54321";
    SUPABASE_PUBLISHABLE_KEY = "<A really long but different string>";
    
    // Import the supabase client like this:
    // import { supabase } from "@/integrations/supabase/client";
    
    export const supabase = createClient<Database>(SUPABASE_URL, SUPABASE_PUBLISHABLE_KEY);
    
    

    You will also need to either create a .env.local file to replace the values in the .env file, or replace them in the .env file directly.

    Step 6: Start the webapp

    Conveniently, the README.md file from Lovable should include instructions on how to stand up a local version of the web app which was connected to the remote version of the Supabase project.

    # Step 1: Clone the repository using the project's Git URL.
    git clone <YOUR_GIT_URL>
    
    # Step 2: Navigate to the project directory.
    cd <YOUR_PROJECT_NAME>
    
    # Step 3: Install the necessary dependencies.
    npm i
    
    # Step 4: Start the development server with auto-reloading and an instant preview.
    npm run dev
    

    Running npm run dev will respond with another local URL (http://localhost:8080/). When you load this URL in a browser, you should see a local version of your Lovable app!

    That’s it! You should be ready to go!

    Pro-tip: Lovable commits the .env file into the Git repo. However, keeping track of the environment variables is a pain, not to mention, can lead to accidentally committing your local Supabase_URL and Anon_key to the repo (ask me how I know).

    Here is the workaround I figured out:

    1. Remove .env from the git repo using git rm --cached .env
    2. Add .env to your .gitignore file and commit the change
    3. Edit the .env file to contain a environment variable which can be easily set to “true” or “false”: `VITE_IS_LOCAL=”true”
    4. Edit the integrations/supabase/client.ts file to set the SUPABASE_URL and SUPABASE_PUBLISHABLE_KEY based on the environment variable.
    // This file is automatically generated. Do not edit it directly.
    import { createClient } from '@supabase/supabase-js';
    import type { Database } from './types';
    
    let SUPABASE_URL = "";
    let SUPABASE_PUBLISHABLE_KEY = "";
    const isLocal = import.meta.env.VITE_IS_LOCAL === "true";
    
    if (isLocal) {
        SUPABASE_URL = "http://127.0.0.1:54321";
        SUPABASE_PUBLISHABLE_KEY = "<The long local string>";
    } else {
        SUPABASE_URL = "<Remote project URL>";
        SUPABASE_PUBLISHABLE_KEY = "<The long remote string>";
    }
    
    // Import the supabase client like this:
    // import { supabase } from "@/integrations/supabase/client";
    
    export const supabase = createClient<Database>(SUPABASE_URL, SUPABASE_PUBLISHABLE_KEY);
    
    

    Deploying Changes

    In order to keep the version control continuous between Lovable and Supabase, you need to commit changes to any supabase/migrations and supabase/functions twice. First, via Supabase CLI to push the changes to the remote database and then second in Git to push the changes to Lovable.

    You can find information Migrations and deploying changes here: https://supabase.com/docs/guides/deployment/database-migrations

    You can find information Edge Functions and deploying changes here: https://supabase.com/docs/guides/functions/quickstart-dashboard


    Does this process work for you? Got any great tips on how to make it work even better? Let me know!

  • Deduplicating 14K Posts (Part II)

    In my last post, I walked through the start of how I deduplicated 14,000 posts Document Library Pro posts. That post covered how I crafted a script to search through PDF attachment posts and delete any duplicates based on the filename.

    The reason for starting with PDFs was to make sure that if a dlp_document post referenced any duplicate PDF attachments, it would be deleted. By cleaning out the PDF’s first, I had a clean base to check against the dlp_documents.

    Now, it was time to move on to cleaning out the actual dlp_document post type.

    jb-dlp-document-deduplication.php

    Once I had written the script to deduplicate the PDFs, I was able to use it as a base for the dlp_document posts. Many of the properties and functions are the same with some slight name tweaking for dlp_document instead of pdf_media.

    The general concept is the same, there are new WP_CLI commands introduced which run the deduplication script, clear out the options table, and clear out any log files.

    One of the ways this script differs is that rather than saving the earliest posts as the originals, we are saving the newest. With the PDFs, we saved the earliest posts to avoid any -1, -2, or other suffixes added to the URLs and file names. However, with the dlp_document posts, the most recent post has the most complete information regarding taxonomies, and excerpts. So rather than querying the database in ascending order, we pull the dlp_document posts in descending order by their post ID.

    $results = $wpdb->get_results(
       $wpdb->prepare(
          "
          SELECT * FROM {$wpdb->posts}
          WHERE post_type = %s
          AND ID < %d
          ORDER BY ID DESC
          LIMIT %d
          ",
          'dlp_document',
          $this->start_post_id,
          $this->batch_size
       )
    );
    

    Another way this script differs is that rather than handling just duplicate posts, we also had to verify that the PDF file stored in the post meta actually exists. If the file doesn’t exist, the dlp_document post needed to be deleted. It doesn’t help anyone to have a dead link displayed on the site.

    Rather than trying to shove this extra check into the current process of handle_duplicate_post(), I decided to create a separate flow for posts where the PDF file did not exist. This allowed me to log the instances of duplicate posts and missing PDF’s separately.

    If you have read the previous post, you are generally familiar with the flow of handling duplicate posts. Rather than repeating the concepts, here I will focus just on checking for missing PDFs.

    It starts within the foreach loop in deduplicate_dlp_docs(). Before checking for a duplicate document, the loop check that the PDF is valid by calling determine_if_pdf_exists().

    determine_if_pdf_exists()

    The method takes the object of the dlp_document post as an argument. This object contains the ID and title of the document post. We use this to handle fetching the post meta where the PDF information is stored.

    Here is where things got tricky; the document library pro plugin allows a PDF to be saved to a post using one of two options–a direct URL or a post ID. The posts in the customer’s database used both options intermittently.

    In order to determine how the PDF is saved to the post, the first thing I had to fetch was the “link type”

    // Confirm that PDF file is attached by checking the post meta
    $pdf_link_type = get_post_meta( $dlp_document_post->ID, '_dlp_document_link_type', true ) ?? null;
    
    

    The expected results for $pdf_link_type are either “url” or “file”. If the result is anything else, we should delete the dlp_document post because it is incomplete without an attached PDF.

    The type of link determines the meta key for the post meta containing the actual PDF information. For example, for url the meta key is _dlp_direct_link_url. For file, the meta key is _dlp_attached_file_id. The simplest way to handle all three cases (url, file, anything else) was to create a switch statement.

    For the url and file cases, I pull the post meta from the data base. If the post meta exists, then I check that the value it provides (e.g. a URL or post ID) actually exists.

    If the data is sound, it is returned to deduplicate_dlp_docs() as part of an array. Otherwise, handle_missing_pdf_file() is called.

     switch ( $pdf_link_type ) {
        case 'url':
            $pdf_file_path = get_post_meta( $dlp_document_post->ID, '_dlp_direct_link_url', true ) ?? null;
            // If the post meta does not exist, the PDF file is missing
            if ( null === $pdf_file_path ) {
               $this->handle_missing_pdf_file( $dlp_document_post, $pdf_link_type, null );
               return $attached_pdf_meta;
            }
    
            // If the post meta exists, check that the file exists
            if ( ($pdf_file_path && ! file_exists( $pdf_file_path ) ) ) {
               $this->handle_missing_pdf_file( $dlp_document_post, $pdf_link_type, $pdf_file_path );
               return $attached_pdf_meta;
            }
    
            $attached_pdf_meta['link_type'] = $pdf_link_type;
            $attached_pdf_meta['pdf_file'] = $pdf_file_path;
            break;
        case 'file':
            $pdf_post_id = get_post_meta( $dlp_document_post->ID, '_dlp_attached_file_id', true ) ?? null;
            // If the post meta does not exist, we assume the PDF file is missing
            if ( null === $pdf_post_id ) {
               $this->handle_missing_pdf_file( $dlp_document_post, $pdf_link_type, null );
               return $attached_pdf_meta;
            }
    
            // If the post meta contains a document post ID, check that the document post exists
            if ( ( $pdf_post_id && ! get_post_status( $pdf_post_id ) ) ) {
               $this->handle_missing_pdf_file( $dlp_document_post, $pdf_link_type, $pdf_post_id );
               return $attached_pdf_meta;
            }
    
            $attached_pdf_meta['link_type'] = $pdf_link_type;
            $attached_pdf_meta['pdf_file'] = $pdf_post_id;
            break;
        default:
            // If the DLP Document post is neither a direct link nor a media library attachment, it should be deleted
            $this->handle_missing_pdf_file( $dlp_document_post, $pdf_link_type, null );
            break;
    }
    

    handle_missing_pdf_file()

    Similar to handle_duplicate_post(), this method warns the user that a document with a non-existant PDF was found and confirms if the user wants to proceed with logging (dry-run) or deleting(for real) the dlp_document post.

    In both scenarios, information about the dlp_document post and PDF are gathered to be later logged into a CSV.

    gather_missing_pdf_posts_data()

    This method takes the post object for the dlp_document post and the meta data for where a PDF was expected to exist as arguments. It then pushes it into an array that is saved with the stash_of_missing_pdf_posts array.

     $this->stash_of_missing_pdf_posts[] = array(
         'dlp_document_post_id'      => $dlp_doc_post->ID,
         'dlp_document_post_title'   => $dlp_doc_post->post_title,
         'pdf_link_type'             => $pdf_link_type,
         'missing_pdf_id_or_url'     => $missing_pdf_id_or_url,
    );
    

    Once the missing dlp_document is logged and handled, the code returns to determine_if_pdf_exists() where an empty array is returned to deduplicate_dlp_docs(). The empty array indicates to the foreach loop, where everything started, that there is nothing else to do with this post. There is no point in checking for if the post is a duplicate since it was already logged and possibly deleted. The loop continues on to the next post in the query results without checking for a duplicate post.

    If determine_if_pdf_exists() does return a not-empty array, then the code continues processing through to check if the dlp_document post is a duplicate of a post which was already found and tracked.

    Once the foreach loop concludes, both the missing PDF and duplicate posts are logged. I used separate CSV files to allow different information to be stored in each CSV and to make it easier to parse how many posts were true duplicates and how many had invalid documents saved in the post meta.

    Script Clean Up

    Since there are two types of log files for the dlp_document posts, I created a third clean-up commanddlp-document-missing-pdf-delete-logs. This gives the user flexibility to delete just the duplication log files (via dlp-document-dedup-delete-logs) or the logs for the missing PDFs.

    Results

    PDF Deduplication:

    • Processed 6994 PDF posts
    • Total duplicate posts found: 820
    • Unique PDF posts found: 6174

    DLP_Document Deduplication:

    • Processed 14,402 DLP Document posts
    • Total duplicate posts found: 6108
    • Total posts with missing PDF file found: 3151
    • Unique DLP Document posts found: 5143

    Now the library is all cleaned up. There are no more duplicate posts and all links to PDFs should work properly.

    Have a site which needs some data clean up? I’m available! Fill out the contact form below to reach out.

    Go back

    Your message has been sent

    Warning
    Warning
    Warning
    Warning.

  • Deduplicating 14,000 Posts

    I have been recently working with a client to give their site a refresh. Rather than rebuild the entire thing, they wanted to make sure their current site was up-to-date and make a few key functionality improvements. One of these improvements is to clean up a library of PDF files they have hosted using Document Library Pro.

    The Problem

    As near as I can tell, whoever set up the library did a post import. But things didn’t work the way they expected so they did another import. And another one…all without clearing out the previously imported posts. This resulted in multiples of each document being added to the website.

    For fun additional complexity, each of the “dlp_document” is tied to a PDF file which may be uploaded via the Media Library or attached to the post via the “direct URL” meta data. Or, the file may not exist at all. This means we also need to remove any duplicate PDF files. Plus, check that any file which the dlp_document has saved in the meta-data actually exists.

    The Process

    Manually checking 14K+ documents would not only be time consuming, but also lead lots of room for error. Rather, I decided to do the clean up by writing scripts within a plugin. The scripts are then executable by custom WP-CLI commands.

    When it came to what order of actions needed to be taken, I decided to approach the problem by breaking it down into two scripts:

    1. Remove any duplicate PDFs
    2. Remove documents posts where a PDF does not exist
    3. Remove any documents posts which is a duplicate

    The Code

    You can find the plugin code here: https://github.com/JessBoctor/jb-deduplication

    The main plugin file, jb-deduplication.php, is really basic. Essentially, it is just used to load the two script files into WordPress.

    jb-pdf-media-deduplication.php

    The jb-pdf-media-deduplication.php file holds the PDF_Media_Deduplication_Command class plus two other clean up commands.

    There are a number of properties listed in PDF_Media_Deduplication_Command class. The first four are all arguments which control the scope of the script.

    • $dry_run – Run the script without actually deleting things
    • $skip_confirmations – Skip manually checking duplicates
    • $batch_size – The number of posts to check
    • $start_post_id – Where to start the query

    The remaining properties are all used by the script to track progress.

    • $last_post_id – The ID of the last post to get checked
    • $unique_post_titles – An array of unique post titles which can be checked agains for duplicates
    • $duplicate_posts_to_log – A nested array of data which tracks duplicate posts which are found
    • $total_duplicate_posts – A count of the duplicate posts which are found

    __invoke

    This function is similar to __construct as it is the first thing called when the command is run. In the case of WP-CLI, you only want to use the __construct function if you are using data outside of the class to or the command arguments to run the command. For example, if you had options stored in the wp-options table. You could fetch those options, pass them to a new instance of the class, and then when the WP-CLI command is run, it would use those pre-set options.

    In the case of this script, all we need are the arguments passed from calling the WP-CLI command, so we can skip __construct. Instead, we just use __invoke to set our class properties and get the ball rolling.

    $batch_size, $start_post_id, and $unqiue_post_titles

    Since there is such a large number of posts which needed to be sorted, I wanted to be able to run the script in batches. This way, I could spot check small amounts of posts. However, since the goal is to find unique posts across the whole data set, I needed to figure out a way not to lose track of the progress made between different batches.

    determine_start_post_id()

    This method determines where a batch should start its post query. If the --start--post-id argument is passed with the WP-CLI command, then that is the post ID which is used as a starting point. However, I don’t want to have to remember where the last batch run ended. Instead, the $last_post_id property is store in the wp-options table as 'pdf-deduplication-start-post-id' (mouthy, I know). This way, if a user runs continuous batches, then the script can pull the next start post ID from the options table. If their is no post ID saved and no --start-post-id argument, then the start post ID uses the default property value of 1.

    In a similar way, I don’t want to lose track of the unique posts which were found during each batch run. The $unique_post_titles property is an empty array by default. To keep it up to date, if any unique post titles are found during a batch run, they are saved to the wp_options table as pdf-deduplication-unique-post-titles. When the __invoke function is called, it checks for this option and loads any previously found unique post titles to the $unique_post_titles property before starting the deduplication process.

    deduplicate_pdfs()

    This is where the main “deduplication” action happens. It gets called at the very end of __invoke once the class properties have been set up. The method does four things:

    1. Fetches all PDF attachment posts
    2. Handles the post if it is a duplicate or unique
    3. Updates the $unqiue_post_titles records
    4. Logs the result of the batch run

    get_pdf_posts()

    This is how we fetch the PDF attachment posts. It runs a simple query for any posts in the media library

    global $wpdb;
    
    $results = $wpdb->get_results(
       $wpdb->prepare(
          "
          SELECT * FROM {$wpdb->posts}
          WHERE post_type = %s
          AND post_mime_type = %s
          AND ID > %d
          ORDER BY ID ASC
          LIMIT %d
          ",
          'attachment',
          'application/pdf',
          $this->start_post_id,
          $this->batch_size
       )
    );
    

    One of the things which turned out to be key in the deduplication process is the order of the post results. Since we want to use the earliest version of the PDF file which was uploaded, to avoid keeping any PDF files with -1 or -2 suffixes, the post results have to be in ascending order.

    Once we have the results, we can set the $last_post_id property for the class. This will let us keep track of where the batch for the script ended.

    // Set the last_post_id property to the last post ID in the results, if any
    if ( ! empty( $results ) ) {
        $last_post = end( $results );
        $this->last_post_id = $last_post->ID;
    }
    

    The results get returned to deduplicate_pdfs() to be looped through a series of logic filters.

    To start, we save $post->post_title into a separate varaiable $post_title. This allows us to fuzzy match the post title against known unique titles by stripping out -1, -2, and -pdf from the post title without changing the original $post->post_title. Each of these variations of $post_title are checked against the $unique_post_titles array. If a match is found, the $post object and the ID of the post with the matching title get sent through handle_duplicate_post().

    If there isn’t a match from the four variations, then the post is considered unique. The post gets added to $unique_post_titles in a $post->ID => $post->post_title key => value pair.

    handle_duplicate_post()

    If a PDF attachment $post is considered to be a duplicate, we need to confirm what the user wants to continue, log the post, and most likely delete the $post and uploaded file.

    In the case of a dry-run (without skipping confirmations), the script will confirm if the user wants to log the duplicate PDF. In the case of the code being run for real, it will ask the user if they want to delete the post and file. If the user responds anything other than “yes”, then the script will exit mid-run.

    When the user gives a “yes”, the first thing which happens is some basic information for the original PDF file and the duplicate get saved in gather_duplicate_posts_data().

    Once the information is saved, in the case of a real run, the attachment is deleted via a call to wp_delete_attachment().

    gather_duplicate_posts_data()

    This method captures the post ID, title, and URL of the original and duplicate PDF posts. In the case of the duplicate, it will also attempt to capture the size of the file. This way, we can see how much data is being removed.

    $this->duplicate_posts_to_log[] = array(
      'original_post_id'.         => $matching_post_title_id,
      'original_post_title'       => $this->unique_post_titles[$matching_post_title_id],
      'original_pdf_url'          => get_attached_file( $matching_post_title_id ),
      'duplicate_post_id'         => $duplicate_post->ID,
      'duplicate_post_title'      => $duplicate_post->post_title,
      'duplicate_pdf_url'         => $duplicate_file,
      'duplicate_pdf_file_exists' => $duplicate_file_exists,
      'duplicate_pdf_filesize'    => $duplicate_file_size
    );
    

    The data is added to the $duplicate_posts_to_log property as a nested array. This allows us to use each array as a row in a CSV file which gets created by log_results().

    Once each post object in the query is checked for a duplicate, the pdf-deduplication-unique-post-titles option is updated to match the current version of the $unique_post_titles array via save_unique_post_titles_to_options().

    log_results()

    Once the unique posts are recorded, the duplicates get logged. In addition to printing some basic stats about the batch in the command line, the method makes use of the built-in WP_CLI\Utils method write_csv to create a CSV file containing the information in $duplicate_posts_to_log.

    The file gets stored in the plugin directory under “logs”.

    The script is done. Any duplicates will be logged and deleted and the PDF attachments will have been cleaned up.

    Script Clean Up

    To avoid bloat from running the script, I created two extra WP-CLI commands, pdf-media-dedup-clear-options and pdf-media-dedup-delete-logs. These clear out any options created in the wp-options table and delete any log files, respectively.

    To be continued…

    Follow along for the break down of jb-dlp-document-deduplication.php and how it clears out not only duplicates, but also posts with bad references. Exciting stuff!

    Update!

    Part two can be found here:

  • Avoid over-engineering with better (and earlier) questions

    A few weeks ago, the mini-client and I were driving home when he asked if we could start a new project. I asked what he had in mind.

    Photo by David Ruh on Pexels.com

    He wanted snakes. So we started discussing the materials we had on hand and what he envisioned. We landed on making snakes out of paper-towel rolls.

    As he chattered in the backseat, I let my mind wander to how I would engineer a snake from a paper towel roll. In theory, they were similar shapes, both long cylinders. It was the issue of movement I was trying to solve.

    Snakes slither in a motion that requires flexibility. So, rather than a solid cylinder that spans the full length of the “snake”, a set of linked smaller cylinders would be the better option.

    But how to produce that shape from a paper-towel roll?

    Photo by Jessica Lewis ud83eudd8b thepaintedsquare on Pexels.com

    I thought my solution was pretty ingenious. I folded a crease into the top of the roll, and used double-sided sticky tape to create a “spine”. Then, I made vertical cuts along the bottom to allow for movement as the snake “slithered”.

    The problem? The more time I spent folding, cutting, and taping–the more distressed the mini-client became.

    I tried to reassure him how cool it would be. I explained I was making it “move like a snake”. All of my explanations did nothing to assure the mini-client what I was doing was the right way to make a snake.

    Why?

    Because it wasn’t what he wanted.

    When I finally stopped to ask him what he thought it should look like, it was simply a snake drawn on a cardboard roll. And the mouth should open.

    🤦‍♀️ 😅 😂

    This was a reminder that as creatives, engineers, and ideas-people, it is so easy to get carried away with how cool we can make it. It’s important not to skip over checking with the client to ensure we are building what they want.

    I was building the mini-client a web app. He wanted a static page with a gif.

    The brief

    The PR

    Eventually, we got on the same page and I was able to deliver a product the mini-client was happy with. However, I wasted time and materials in pursuit of my own goals. That is never ideal.

    I’m thankful for the reminder to invest time in asking better questions earlier in the creative process to meet client expectations.

  • Shipped: A new subscription management experience on WordPress.com

    Last week, a new subscription management experience was shipped to WordPress.com users.

    Before*

    After*

    The best part? The users probably didn’t even notice (yet) 🎉

    The problem

    As a WordPress.com user, understanding where to find a particular subscription was a bit of a headache. There were previously two options.

    1. Go to My WordPress.com Account > Purchases and scroll through a list of all purchases on the account. This includes WordPress.com, Jetpack, Akisment, user-to-user products, certain WooCommerce solutions, Professional Email, and marketplace purchases like plugins and themes. It’s alot.
    2. Go to a specific site home, click on Upgrades > Purchases and then see the list of purchases for just that site. This narrows the field a bit, but also means you have to move in and out of different sites to use this filtered list.

    The feature request & solution

    In 2024, David Rothstein pointed out that the interface of the global purchases list would be improved if it could be easily searched, filtered, and sorted. I took on that challenge with the help of Payton Swick.

    After an initial investigation, we decided that rather than building custom functionality, we would migrate the current layout (which was a faked table) to use the DataViews component built into WordPress Core.

    Project considerations

    The biggest challenge of the project was how to complete that conversion. There were a few key aspects which guided how we defined milestones:

    • The Purchases management screen is used by millions of paying WordPress.com users. We couldn’t simply take down the original version and replace it with an “excuse our dust” page while we completed the changeover
    • We didn’t want to redesign the page. The goal here was to add functionality, not move the user’s cheese and break years of muscle memory for paying customers.
    • The previous Active upgrades line items were generated by a class component called PurchaseItem. While it appeared as though there were four distinct sections (Status, Product, Status, and Payment method) these were all tied up into a single class component and couldn’t be accessed individually.

    The process

    In order to prevent breaking current functionality, I copied over the previous purchases-list directory into a new version called purchases-list-in-dataviews. Then I created a feature flag, purchases/purchase-list-dataview. This allowed the work to be done behind the scenes and shown only in certain environments.

    The next phase was to get the PurchaseItem component to render within the DataViews table. I started out by rendering a table layout that contained the PurchaseItem component as a single column. Once that was done, I started to break the PurchaseItem class into individual functional components which could each be rendered into individual columns.

    1. Sites
    2. Product
    3. Status
    4. Payment method

    While it would be great to say it only took five PRs to get this to happen, we all know development doesn’t work that way 😅. There were a few hiccups, including a dive into how to render backup payment methods and deciding how to display user-to-user (Membership) purchases.

    The good news is that the project kept moving forward. Which got us to the point where we could start to implement the fun part: searching, sorting, and filtering.

    Search

    There is now a functional search box which can be used to search for a number of data points. This includes the site name, the product type, the service, or the payment method.

    Filter

    By clicking on the tornado looking icon, there are three filters available to the user. This includes the Site, Type, and Expiring soon status.

    Sort

    The sort feature can be found within the table settings gear icon. The user can choose which data point to sort by and if they want to sort in ascending or descending order.

    What’s next?

    I hope users begin to discover the newest features and are delighted by the results.

    There are also improvements to be made to the API so that the filtering can be done server-side for performance reasons. There is opportunity for a bit a clean up, such as deprecating the old PurchasesSite component. These are things which the payments teams at Automattic can prioritize.

    As for me, I am open to new roles and projects. If you have a legacy code base in need of refactoring to improve your user experience, please feel free to reach out!

    Go back

    Thanks for reaching out! I'll be in touch soon!

    Warning
    Warning
    Warning
    Warning.

    * Before you get excited about seeing credit card numbers, these are all Stripe test card numbers.

  • Making a SCUD Plugin

    I am working with a company who needs a nicely formatted way to display a group of employees. Currently, the team page is using hardcoded [author] shortcodes to format the page, but this makes it a pain for the staff to update the team information.

    So they don’t.

    So, I am creating a Simple Custom User Display, or SCUD, plugin.

    Did you think I was talking about missiles? Also, did you know there is a type of baby shrimp called a scud? You can learn about them here.

    I am sure there are already WordPress plugins which have a similar functionality to display users in a page archive. However, in this case, the company that I am working with doesn’t need a bunch of bells and whistles. Since they don’t have someone who regularly handles updating the site, keeping things light and less likely to run into update conflicts would be best.

    Also, I haven’t built a plugin from scratch in a bit. The majority of my recent work has been in digging through legacy code and surgically making improvements and refactors. Building something new seems like fun.

    Before I started writing any code, I spent some time thinking through the plugin and what this company needed. I thought it might be an interesting exercise to share.

    Problem Statement:

    The “Company” needs to improve their current “Team” page. Currently, it is out of date and is not easily updated by the staff at the Company. The current page is making use of the [author] shortcode in order to format contact information which is hard coded into the page directly. The metadata for each team member is not being pulled from a custom post type or user profile. This means that in order to edit a single piece of information (e.g. update a team member title), the user must sort through the page code to make changes.

    Solution:

    Create a lightweight plugin which uses as much core WordPress functionality as possible to make each team member information its own dataset. This way, the team member information can be easily pulled, sorted, filtered, and displayed on the front end. To make this easily editable in the future, utilize WordPress users to contain the sales rep information and metadata.

    Information store: Users

    The current website displays the team members grouped by their team and state. We can maintain this information if we can create a taxonomy for users. This looks to be a simple case of registering a custom taxonomy on the user type

    After creating the taxonomies, we want to limit the capabilities of the team member users. We can do this by creating a custom user role “Team Member” and limiting what they have access to. The role needs to only be added during the activation hook of our plugin and then we can begin to assign users to the role.

    The last piece of information we need to include is custom user meta fields. The team members have extra information, like their title and regions, which we need to be able to save. We can include this in the edit_user_profile hook

    The end goal here is that any team member could log in to edit their own profile information. If they don’t have their login, an administrator would also be able to update the contact information for them.

    Display

    Option 1: DataViews (Stretch)

    WordPress has recently introduced DataViews to core. This allows data sets (such as users) to not only be displayed, but also searched, filtered, and sorted. 

    DataViews has multiple layout options, including a table, grid, or list. Since the DataViews fields each render the component in custom React, we can customize the layout of the user information and how it is displayed.

    Additionally, since not all “fields” have to be displayed to be used for filtering, we can use the taxonomies to filter the users.

    The goal here is to allow users to be added to the website and then be automatically added to the Sales Team page without needing to edit the page content itself.

    If we contained the DataViews within a Block (for Divi or Gutenberg) then we would also be able to add custom content above or below the block without having to touch code.

    Divi can now display Gutenberg blocks. So this means we only need to create the block as part of Gutenberg.

    Option 2: Page Template

    This is basically the same principle as option 1, but with a custom page template. This is less flexible and doesn’t allow for searching, sorting, or filtering.

    Option 3: Divi Drag and Drop Page

    This option would be the most similar to the current implementation (the layout being created within the page editor). While we would still have the benefits of better user information control, it wouldn’t solve the long term-ease of use problem. It would likely be the least time consuming though.

    I am developing a boiler plate version of this plugin here: https://github.com/JessBoctor/simple-custom-user-display

    The goal is not to create a WYSIWYG plugin which allows a user to install and customize the new user role and display. Rather, it is a lightweight plugin with enough instructions on how to customize things that a dev can pick it up and make it their own.

    Which is what I will do for the company I am working with 🙂

    PS Before and after images will be coming soon once I have the website refresh completed.