Entity Management is a Middesk product that helps small businesses stay compliant with their payroll taxes. Its core function is to help register businesses with different state agencies, and many of these state agencies communicate important information - like account IDs, rate changes, and other notices - via snail mail. We, then, are responsible for collecting and processing this mail to extract the relevant information.
To extract the information in the most efficient way possible, we use a third party partner to scan and upload the envelopes and contents of all pieces of our mail. Our operations team then manually identifies and tags the relevant registration for any piece of mail, and extracts out any information pertinent to the registration. Ideally, both parts of this extraction process would be automated, but for now we’ve started with automating the mail to registration tagging.
Let’s say a California company has an employee in Georgia, and has used Middesk to register with the Georgia Department of Labor for unemployment insurance withholdings. The State Unemployment Insurance (SUI) rate changes for the company, and the Georgia Department of Labor mails a letter to them with the rate change. It’s the business’s responsibility to reflect that new rate when they run payroll for their Georgia-based employee and therefore our responsibility to communicate that rate change to the business.
To identify if some incoming piece of mail is for this company’s registration in Georgia, we need to determine two things:
Step one is actually quite easy. Our mail partner attaches the recipient company onto any scan that we receive so we are able to match it against our internal company list automatically. Step two, on the other hand, isn’t so simple. To identify which of the company’s registrations a piece of mail is for we need the sender department and the sender state (“Department of Labor,” “Georgia”).
Why both? Well, let’s say the company has registrations in Maine, Georgia, and New York. All of these states have a Department of Labor that handles unemployment insurance registrations; so if a piece of mail came from the DoL and we didn’t know the state, we couldn’t programmatically determine what registration the piece of mail was for. We would need manual review of the envelope to identify the sender state.
To extract the sender name and state from an envelope manually, a person would just look at … well, the sender name and address on the envelope. That’s fine for a one-off scenario. But we need to handle thousands of these scenarios a month.
To recreate this programmatically we turned to Optical Character Recognition (OCR) and Artificial Intelligence (AI).
Optical Character Recognition (OCR) is a process through which a computer can scan an image and return the text contained within it. Think of it as a way for a computer to ‘read’ an image or document. Once image text has been returned, all sorts of logic can be applied to it programmatically.
We use OCR to transform the image of the envelope into processable text. Then we make a request to OpenAI to identify the sender name and address from the extracted OCR text. Once we have the response from OpenAI, we use recipient data to match against a Middesk company record, and we use sender data to lookup the company’s corresponding registration.
The results from this automation have been exceptional. Previously, we were only auto matching by attempting to match from the sender field passed from our mail partner. Through our new methods, we’ve boosted our automatic-parse rate from 41% to 91% overnight. Additionally, our error rate has dropped from 2.5% to 1.5%. Note, we define “error rate” as the number of pieces of mail that were automatically tagged onto an incorrect registration (always within the correct company, just the wrong state). We’ve been able to continually reduce this error rate by improving the AI prompt, adding in aliases for businesses so our partner is better able to identify the recipient, and improving our own internal priority of registration matching metrics.
The release of OpenAI’s GPT-4 is very exciting for many reasons, including its ability to handle images as inputs. When that feature is released, we are excited to integrate it into our system and bypass the OCR recognition step of the process. In addition, we are hopeful that we will be able to use this feature to categorize and extract the pertinent content from a piece of mail automatically as well.
To learn more about how Middesk can help optimize your payroll compliance process, talk to our sales team and set up a demo to see exactly how it can help you register in states within minutes and onboard employees without delays.