Short-form video feed has become one of the most popular ways for billions of users to interact with content, where users watch short-form videos of a few seconds one-by-one in a session. The common solution to improve the quality of experience (QoE) for short-form video feed is to treat it as a common sequential item recommendation problem and maximize its click-through rate prediction. However, the QoE of short-form video streaming under dynamic network conditions is jointly determined by both recommendation accuracy and streaming efficiency, and thus merely considering recommendation will lead to the degradation of the QoE of the streaming system for the audience. In this paper, we propose SSR, namely the short-form video streaming and recommendation system, which consists of a Transformer-based recommendation module and a reinforcement learning (RL) based bitrate adaptation streaming module. Specifically, we use Transformer to encode the session into a representation vector and recommend proper short-form videos based on the user’s recent interest and the timeliness characteristics of short-form video contents. Then, the RL module combines the representation of session and other observations within the playback, and yields the appropriate bitrate allocation for the next short-form video to optimize a given QoE objective. Trace-driven emulations verify the efficiency of SSR compared to several state-of-the-art recommender systems and streaming strategies with at least 10%-15% QoE improvement under various QoE objectives.