Product Ingestion & Normalization
Hundreds of retailer inventory data feeds, containing millions of consumer products, are downloaded every day and ingested into our ecosystem where they enter a two-step normalization process.
The initial step performs a set of fundamental normalization actions. It first fits the many different variations of header values (e.g., name, price, description) across retailers to our proprietary schema. Entity resolution is then performed on products that are identical in every way but size and color, with the different available sizes and colors encoded into the single row for the product. In addition, product prices are compared to an internal component that tracks price history across all products previously processed, allowing us to determine whether the product has a price that is lower than we have seen previously, and therefore is on sale. The individual products are saved to a datastore, and no retailer data is removed in this step.
The second step performs more specialized normalization. An ensemble of models predict a category for the product from among the hundreds of nested proprietary categories in our taxonomy. This allows us to organize our entire product base categorically and hierarchically, and empowers cross-merchant comparisons that would otherwise be impossible. A side benefit of this process is that it is used alongside the other product information to generate a proprietary size value that serves to normalize the extreme variability found in size signals across retailers. This proprietary size enables us to place a 'Petite' sized t-shirt from Retailer X alongside a 'S' sized t-shirt from Retailer Y. Finally, product metadata is reduced for API consumption before getting saved to a separate datastore, so that only information we eventually surface or use is retained.
Computing A Consumer’s Taste
The PreciseTarget Taste Graph is the foundation for our data products. But how do we formulate a consumer’s taste? We consider taste as a data abstraction layer from the consumer’s purchase history. The goal of the system is to learn the customers, by observing the meta-data related to the products purchased by the consumer. When retailers provide PreciseTarget with their catalog assortments, they also provide 20 to 30 metadata items for each product in their assortment. Our system wants to know the attributes of the products you are purchasing, ranging from the product’s brand, to its fabric and materials, colors, cleaning instructions, unique style attributes, and price. By combining the attributes of the purchased products, the system begins to understand the user’s taste. This allows for far greater insight into they types of products which fit a consumer’s individualized taste.
PreciseTarget does not accept personally-identifiable information (PII) about consumers. Instead we de-identify all consumers with a "Synthetic Universal Identifier." The SUI is a random code created by a third-party processor to protect business partner data. The SUI is not algorithmically created and only the third-party processor holds the match key. Therefore, PreciseTarget cannot decrypt the SUI.
Additionally, PreciseTarget makes an effort to strip all selling-merchant information (SMI) when processing partner retailer transactions. A partner wishing to submit data to PreciseTarget must first give the data to a third-party processor. The partner must certify to both PreciseTarget and the processor that it has the legal right to share the data. The processor will append the desired data to the SUI and submit it to PreciseTarget.
It is important to note we aren’t interested where a product was purchased, but rather we want to learn about the taste properties of the product that was purchased. By de-identifying the retailers in our system, we further protect the confidentiality of the provided data. Example: The retailer or business partner submits transaction data to a third-party processor. The third-party processor removes personal identifiers and matches this data to PreciseTarget SUIs and audience segments. The third-party processor then delivers the de-identified data to PreciseTarget.
Despite the lack of PII, when partnering with PreciseTarget we can identify audiences and customers lists a variety of ways. We also work with third parties such as LiveRamp who can offer a number of different matching keys based on what you have to work with. Finally, our third-party processors can match with PII, as the entirety of that data remains safely siloed with the trusted intermediary.
There are many obstacles in the way of single, cross-merchant view of product catalog data: aggregating the product catalog feeds at the basic level, understanding the information in them, and using this information to extract the product entities encoded within.
The first normalizing challenge to overcome is the most basic; obtaining the product catalogs of merchants. These must be collected where available, at the frequency at which they are available, in different formats and schemas. The highest frequency feeds are typically only made available twice a day, meaning that product catalog data is, at a minimum, 6 hours out of date on average. Processing latency must therefore be kept to an absolute minimum. Complicating the need for low latency processing is the extreme asymmetry of merchant product catalog breadth. Most merchants have narrow offerings and small catalogs, but a very small number of merchants have very wide offerings and correspondingly high-volume catalogs. Low latency processing must therefore be maintained in an environment of highly erratic data volumes. Each of these catalogs must then be understood in its particular schema and then mapped to a single, unified schema.
Even when differing schemas are unified, the underlying merchant data is still very different. Retailers typically rewrite manufacturer’s product names, descriptions, and materials according to their own style guides. Colors are frequently renamed to correspond with marketing campaigns or to match other products. For apparel, sizing is encoded as free text using a variety of notations. These different notations must each be parsed, and the quantitative size data extracted. Merchants also include information placing the product within a category in their taxonomy. Merchant taxonomies differ semantically, according to merchant style guides, but also structurally. A merchant that specializes in denim with have a developed, granular taxonomy of denim products, while a merchant with a broader catalog will have a broader taxonomy of which denim is only a small part. In order to normalize product category, these taxonomies must be mapped to a single, unified taxonomy that is a superset of all merchant taxonomies. An additional complication here is that product catalogs do not explicitly contain their merchant’s taxonomies, but rather implicitly contain them as the union of all merchant product categories.
After this normalization, products are still only understood in the context of the merchant offering them. To get to a cross-merchant view, the records representing the products must be matched to other records representing the same product from the same merchant and different merchants. A UPC uniquely identifies a product, but typically at too granular a level of resolution. A product offered in multiple sizes should still be understood to be one product, despite these sizes being assigned different UPCs. Merchants group individual UPCs into SKUs, but these identifiers are merchant-specific. They cannot be used to match products across different merchants.
Each of these challenges, from the logistics of ingesting a diverse set of feeds to the domain-specific understanding of apparel construction and sizing, must be overcome in order to arrive at the cross merchant view of the product universe.
PreciseTarget partners can choose from two delivery options when subscribing to our data products: bulk appends and real-time APIs. Bulk appends consist of a file transferring between partners to which PreciseTarget will append data for all matching rows. Using the real-time API delivery option allows our partners to build robust, deep integrations that allow PreciseTarget to enrich customer data in real time.
Product Taste Audiences segment, for a number of product categories, sets of brands dominant within those categories that share affinity to PreciseTarget tastes, and the consumers with the highest affinities for these brands. This collection allows advertisers to identify both consumers that should be targeted, and the products that they should be targeted with, when the advertiser has relatively few purchases with which to understand the consumer.
PreciseTarget can enrich Product Taste Audiences with additional targeting features, like a consumer’s demographics or geographic location. This allows advertisers to combine Product Taste Audiences with more traditional marketing approaches to narrowly target valuable prospects.
*The documents are password protected. If you don't have the password, please contact us at firstname.lastname@example.org