ML systems

  • Categorization the system components into
1. offline vs online 2. candidate retrieval vs ranking
  • Offline environment: Batch processes that do:
    1. model training
    2. create embeddings for catalog items
    3. building ANN(approximate nearest neighbors) index or knowledge graph
    4. loading item and user data into a feature store
  • Online environment: Service that uses offline artifacts to serve requests via:
    1. Preprocessing and converting input into embedding
    2. Candidate retrieval
    3. Ranking
    4. Postprocessing and return result
  • Candidate retrieval: fast but coarse step to narrow down millions of items into hundreds of candidates.
    • Trade precision for efficiency (99.99% reduction from 1M to 100)
    • Convert input(item or search query) into an embedding
    • Use ANN to find similar items (also were systems using graphs / decision trees)
  • Ranking: slower but precise step to score and rank candidates.
    • Have room to add features and use more complex model compare to retrieval step
    • Including user data, contextual information etc.
    • Either be learning-to-rank or classification model(more often).
    • Final output be either softmax over a catalog of items, or a sigmoid of likelihood of user-item interaction.
  • Alibaba Graph Neural Network based embedding and ranking
    • offline - candidate retrieval
      • constructed weighted, bidirectional item graph
      • use graph to generate item sequences via random walks
      • learn item embeddings via representation learning (w2v skip-gram)
        • Skip-gram predicts the distribution of context words from a center word
      • Save item-to-item similarity map in a key-value store
    • online- candidate retrieval
      • latest items that user interacted (click, like, purchase)
      • use above items to retrieve candidates via the item-to-item map
      • passing the candiate list together with user info to ranking model
    • offline - ranking
      • knowledge graph + past user behavior + item data -> adaptive knowledge graph
      • Merge the adaptive knowledge graph with user data(demographics, user-item preferences) as training data to train a GNN
      • ATBRN: Adaptive Target-Behavior Relational Graph Network
    • online - ranking
      • Given candidate list + user info, query knowledge and feature stores for item / user features
      • Run ATBRN to predict the probability of click for each candidate item.
      • Sort and return
  • Facebook