Managing patients with complex multimorbidity has long been recognized as a dicult problem due to complex disease and medication dependencies and the potential risk of adverse drug interactions. Existing work either uses complicated rule-based protocols which are hard to implement and maintain, or simple statistical models that treat each disease independently, which may lead to sub-optimal or even harmful drug combinations. In this work, we propose the LEAP (LEArn to Prescribe) algorithm to decompose the treatment recommendation into a sequential decision making process while automatically determining the appropriate number of medications. A recurrent decoder is used to model label dependencies and content-based attention is used to capture label instance mapping. We further leverage reinforcement learning to ne tune the model parameters to ensure accuracy and completeness. We incorporate external clinical knowledge into the design of the reinforcement reward to eectively prevent generating unfavorable drug combinations. Both quantitative experiments and qualitative case studies are conducted on two real world electronic health record datasets to verify the eectiveness of our solution. On both datasets, LEAP signicantly outperforms baselines by up to 10-30% in terms of mean Jaccard coecient and removes 99.8% adverse drug interactions in the recommended treatment sets.