Nowadays, short video feed has attracted billions of mobile users all around the world to interact with content effortlessly, yielding an explosive growth of short video commerce. Typically, users watch full-screen short videos of a few seconds one-by-one in a watch-list generated by recommender systems, skipping those they are not interested in. However, the recommender system at the cloud makes a user-interest-specific decision mostly based on the users’ behavior data collected within the application itself (e.g., users’ view history), without examining the lower-layer network and communication statistics. When the playback choked due to the limited network bandwidth, the user will probably skip the video, leading to a waste of bandwidth and degradation of the user’s quality of experience (QoE). Meanwhile, the excessive number of user requests to video contents raises a heavy computational load and communication cost for the recommender system at the cloud to determine which videos to be recommended and delivered to each user in a real-time manner. The advance of edge computing provides a promising avenue of deploying edge nodes with caches (e.g., household devices) beyond cloud and edge servers, such that the recommender system in the cloud can place popular video contents closer to client users, and meanwhile the contents are delivered to client users with good network condition. In this paper, we propose CFP, a cross-layer recommender system for short video streaming with fine-grained preloading technique at the network edge. CFP jointly optimizes the recommendation effect of the video application and the content preloading efficiency under various network conditions at the network edge. CFP takes a two-stage approach: the cloud server first seeks to perform edge-wise instead of user-interest-specific recommendation with neural collaborative filtering recommender, preloading a list of candidate videos to edge nodes, and each edge node, deploying the GRU with attention, then delivers the proper video contents to the client user device according to the user’s preference. Trace-driven emulations demonstrate the efficiency of the proposed CFP scheme.