During the training of medical large language models (LLMs), conventional tokenizers frequently segment domain-specific medical terms into multiple subword tokens, resulting in suboptimal recognition ...