1. Update the import statements: Since the code is using Python 3.7, it's better to use relative imports instead of absolute imports. Replace the import statements like from apache_beam.options.pipeline_options import PipelineOptions with from .apache_beam.options.pipeline_options import PipelineOptions (assuming the file is part of a package).
2. Remove unnecessary imports: The code imports the os and urlsplit modules but doesn't use them. You can safely remove those import statements.
3. Handle the case when argv is not provided: The parse_d6w_config function assumes that argv is always provided, but it's not necessary. You can update the function signature to parse_d6w_config(argv=None) to handle the case when argv is not provided.
4. Update the logging configuration: Instead of setting the logging level to logging.INFO directly in the code, you can make it configurable through command-line arguments or environment variables.
Pushservice is the main recommendation service we use to surface recommendations to our users via notifications. It fetches candidates from various sources, ranks them in order of relevance, and applies filters to determine the best one to send.
Representation Scorer (RSX) serves as a centralized scoring system, offering SimClusters or other embedding-based scoring solutions as machine learning features.
Representation Manager (RMS) serves as a centralized embedding management system, providing SimClusters or other embeddings as facade of the underlying storage or services.
Open sourcing Aggregation Framework, a config-driven Summingbird based framework for generating real-time and batch aggregate features to be consumed by ML models.
Since the first batch of open sourcing, we have added the following components:
- User signal service
- Unified user actions
- Topic social proof service
Update the README to include these.
Unified User Action (UUA) is a centralized, real-time stream of user actions on Twitter, consumed by various product, ML, and marketing teams. UUA makes sure all internal teams consume the uniformed user actions data in an accurate and fast way.
User Signal Service (USS) is a centralized online platform that supplies comprehensive data on user actions and behaviors on Twitter. This service stores information on both explicit signals, such as Favorites, Retweets, and replies, and implicit signals like Tweet clicks, profile visits, and more.
Topic Social Proof Service (TSPS) delivers highly relevant topics tailored to a user's interests by analyzing topic preferences, such as following or unfollowing, and employing semantic annotations and other machine learning models.
Remove unused ranking params which are specified by services when making an Earlybird relevance search.
For cr-mixer: since we always set useTensorflowRanking = true in EarlybirdSimilarityEngineRouter, we will only ever use the TensorFlowBasedScoringFunction for ranking search results. That function doesn't rely on any of the linear params specified in getLinearRankingParams, nor the boosts because we set applyBoosts = false in the request. These parameters are therefore strictly redundant.
The parameters in home-mixer can be removed for essentially the same reason—the parameters are redundant given that we use the Tensorflow scoring function and don't apply boosts.
Please note we have force-pushed a new initial commit in order to remove some publicly-available Twitter user information. Note that this process may be required in the future.