Needle in a 930M Member Haystack: People Search AI @LinkedIn https://www.infoq.com/presentations/people-search-ai-linkedin/

Good intro on Search-101.

Search as a recommendation system

  1. retrive docs
  2. score/rank docs

Architecture

  1. L0, search candidate generation, retrieval
    • query tagging:
      • named entity recogination (from NLP model), like name, title, compamy, location. etc.
      • knowledge graph: meta or facebook, tagging to the same companyId
      • helps ranking: if all tags matching, then rank should be boosted to top
    • re-rewriten query
      • inverted index
      • boolean operators, tags, optimization
      • eg.
        • maybe query expansion, name clustering, eg. chris to christopher
        • multi-language match
  2. L1 ranking, query relevance rank
    • CNN for text embeddings
    • deep features / wide features
    • input O(1000)
    • output O(100)
    • offline
  3. L2 re-ranking, pick topK from L1 results, BERT model, focus on personalization
    • open sourced lib: DeText is a Deep Text understanding framework for NLP related ranking, classification, and language generation tasks.
      • https://github.com/linkedin/detext/tree/master
    • factors: search user, query, L0 retrieved docs
    • meaningful clicks: followup like connection, chatting, etc.
    • what is a good loss function for ranking’s training/learning: Learning To Rank (LTR)
      • pointwise learning
      • pairwise learning
      • listwise learning

Infrastructure

  • Pro-ML platform: model training for online/offline, model serving