Needle in a 930M Member Haystack: People Search AI @LinkedIn https://www.infoq.com/presentations/people-search-ai-linkedin/
Good intro on Search-101.
Search as a recommendation system
- retrive docs
- score/rank docs
Architecture
- L0, search candidate generation, retrieval
- query tagging:
- named entity recogination (from NLP model), like name, title, compamy, location. etc.
- knowledge graph: meta or facebook, tagging to the same companyId
- helps ranking: if all tags matching, then rank should be boosted to top
- re-rewriten query
- inverted index
- boolean operators, tags, optimization
- eg.
- maybe query expansion, name clustering, eg. chris to christopher
- multi-language match
- query tagging:
- L1 ranking, query relevance rank
- CNN for text embeddings
- deep features / wide features
- input O(1000)
- output O(100)
- offline
- L2 re-ranking, pick topK from L1 results, BERT model, focus on personalization
- open sourced lib: DeText is a Deep Text understanding framework for NLP related ranking, classification, and language generation tasks.
- https://github.com/linkedin/detext/tree/master
- factors: search user, query, L0 retrieved docs
- meaningful clicks: followup like connection, chatting, etc.
- what is a good loss function for ranking’s training/learning: Learning To Rank (LTR)
- pointwise learning
- pairwise learning
- listwise learning
- open sourced lib: DeText is a Deep Text understanding framework for NLP related ranking, classification, and language generation tasks.
Infrastructure
- Pro-ML platform: model training for online/offline, model serving