[{"data":1,"prerenderedAt":3270},["ShallowReactive",2],{"article-ai\u002Frdkit-smiles-descriptors-fingerprints":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"date":10,"tags":11,"body":15,"_type":3264,"_id":3265,"_source":3266,"_file":3267,"_stem":3268,"_extension":3269},"\u002Farticles\u002Fai\u002Frdkit-smiles-descriptors-fingerprints","ai",false,"","从 SMILES 到分子指纹：用 RDKit 把分子变成机器学习能读懂的向量","以 AI 制药平台项目为背景，梳理 SMILES、Mol 对象、Canonical SMILES、分子描述符、Morgan 指纹和 Tanimoto 相似度，理解 RDKit 如何把化学结构转成可用于检索、建模和预测的数字特征。","2026-05-21",[12,13,14],"人工智能","生物信息学","AI制药",{"type":16,"children":17,"toc":3249},"root",[18,26,31,44,49,69,74,81,86,115,245,269,274,309,321,357,362,368,373,378,392,397,553,558,563,577,582,588,593,598,649,654,659,665,678,753,763,784,789,857,862,867,933,938,1017,1028,1034,1039,1043,1065,1070,1075,1165,1170,1184,1189,1195,1200,1212,1589,1594,1599,1605,1610,1615,1678,1683,1697,1702,1707,1903,1908,2074,2079,2084,2089,2094,2100,2105,2110,2149,2154,2159,2190,2195,2201,2206,2211,2216,2221,2244,2249,2254,2412,2425,2430,2529,2534,2540,2545,2550,2564,2569,2574,2721,2726,2731,2778,2783,2795,2800,2805,3059,3064,3103,3108,3113,3118,3123,3186,3191,3196,3201,3206,3243],{"type":19,"tag":20,"props":21,"children":22},"element","p",{},[23],{"type":24,"value":25},"text","做 AI 制药项目时，第一道门槛不是模型，而是表示。",{"type":19,"tag":20,"props":27,"children":28},{},[29],{"type":24,"value":30},"模型不能直接理解“阿司匹林”“乙醇”或者课本上的二维结构图。它需要的是数字，是向量，是可以进入表格模型、神经网络或者相似性搜索系统的数据结构。",{"type":19,"tag":20,"props":32,"children":33},{},[34,36,42],{"type":24,"value":35},"这篇文章整理的是我做制药平台项目时前两周的核心笔记：",{"type":19,"tag":37,"props":38,"children":39},"strong",{},[40],{"type":24,"value":41},"SMILES、RDKit、Mol 对象、分子描述符、Morgan 指纹和 Tanimoto 相似度",{"type":24,"value":43},"。",{"type":19,"tag":20,"props":45,"children":46},{},[47],{"type":24,"value":48},"如果用一句话概括，就是：",{"type":19,"tag":50,"props":51,"children":54},"pre",{"className":52,"code":53,"language":24,"meta":7,"style":7},"language-text shiki shiki-themes github-dark","SMILES 字符串 -> RDKit 解析 -> Mol 对象 -> 描述符 \u002F 指纹 -> 相似性搜索 \u002F 机器学习模型\n",[55],{"type":19,"tag":56,"props":57,"children":58},"code",{"__ignoreMap":7},[59],{"type":19,"tag":60,"props":61,"children":64},"span",{"class":62,"line":63},"line",1,[65],{"type":19,"tag":60,"props":66,"children":67},{},[68],{"type":24,"value":53},{"type":19,"tag":20,"props":70,"children":71},{},[72],{"type":24,"value":73},"这条链路看起来不长，但它是很多 AI 制药工程系统的起点。",{"type":19,"tag":75,"props":76,"children":78},"h2",{"id":77},"先把-python-环境管好",[79],{"type":24,"value":80},"先把 Python 环境管好",{"type":19,"tag":20,"props":82,"children":83},{},[84],{"type":24,"value":85},"在写 RDKit 代码之前，先处理一个很现实的问题：项目依赖怎么管理。",{"type":19,"tag":20,"props":87,"children":88},{},[89,91,97,99,105,107,113],{"type":24,"value":90},"我现在更习惯用 ",{"type":19,"tag":56,"props":92,"children":94},{"className":93},[],[95],{"type":24,"value":96},"uv",{"type":24,"value":98}," 管 Python 项目。可以把它类比成前端里的 ",{"type":19,"tag":56,"props":100,"children":102},{"className":101},[],[103],{"type":24,"value":104},"npm",{"type":24,"value":106}," \u002F ",{"type":19,"tag":56,"props":108,"children":110},{"className":109},[],[111],{"type":24,"value":112},"pnpm",{"type":24,"value":114},"：",{"type":19,"tag":116,"props":117,"children":118},"table",{},[119,143],{"type":19,"tag":120,"props":121,"children":122},"thead",{},[123],{"type":19,"tag":124,"props":125,"children":126},"tr",{},[127,133,138],{"type":19,"tag":128,"props":129,"children":130},"th",{},[131],{"type":24,"value":132},"Python 项目",{"type":19,"tag":128,"props":134,"children":135},{},[136],{"type":24,"value":137},"前端项目",{"type":19,"tag":128,"props":139,"children":140},{},[141],{"type":24,"value":142},"作用",{"type":19,"tag":144,"props":145,"children":146},"tbody",{},[147,174,207,233],{"type":19,"tag":124,"props":148,"children":149},{},[150,160,169],{"type":19,"tag":151,"props":152,"children":153},"td",{},[154],{"type":19,"tag":56,"props":155,"children":157},{"className":156},[],[158],{"type":24,"value":159},"pyproject.toml",{"type":19,"tag":151,"props":161,"children":162},{},[163],{"type":19,"tag":56,"props":164,"children":166},{"className":165},[],[167],{"type":24,"value":168},"package.json",{"type":19,"tag":151,"props":170,"children":171},{},[172],{"type":24,"value":173},"声明项目需要什么依赖，以及大致版本范围",{"type":19,"tag":124,"props":175,"children":176},{},[177,186,202],{"type":19,"tag":151,"props":178,"children":179},{},[180],{"type":19,"tag":56,"props":181,"children":183},{"className":182},[],[184],{"type":24,"value":185},"uv.lock",{"type":19,"tag":151,"props":187,"children":188},{},[189,195,196],{"type":19,"tag":56,"props":190,"children":192},{"className":191},[],[193],{"type":24,"value":194},"package-lock.json",{"type":24,"value":106},{"type":19,"tag":56,"props":197,"children":199},{"className":198},[],[200],{"type":24,"value":201},"pnpm-lock.yaml",{"type":19,"tag":151,"props":203,"children":204},{},[205],{"type":24,"value":206},"记录这次解析出的精确版本",{"type":19,"tag":124,"props":208,"children":209},{},[210,219,228],{"type":19,"tag":151,"props":211,"children":212},{},[213],{"type":19,"tag":56,"props":214,"children":216},{"className":215},[],[217],{"type":24,"value":218},".venv",{"type":19,"tag":151,"props":220,"children":221},{},[222],{"type":19,"tag":56,"props":223,"children":225},{"className":224},[],[226],{"type":24,"value":227},"node_modules",{"type":19,"tag":151,"props":229,"children":230},{},[231],{"type":24,"value":232},"当前项目真实可运行的依赖环境",{"type":19,"tag":124,"props":234,"children":235},{},[236,239,242],{"type":19,"tag":151,"props":237,"children":238},{},[],{"type":19,"tag":151,"props":240,"children":241},{},[],{"type":19,"tag":151,"props":243,"children":244},{},[],{"type":19,"tag":20,"props":246,"children":247},{},[248,253,255,260,262,267],{"type":19,"tag":56,"props":249,"children":251},{"className":250},[],[252],{"type":24,"value":159},{"type":24,"value":254}," 负责说“我需要 RDKit、Pillow、NumPy 这些包，并且版本要在某个范围内”。",{"type":19,"tag":56,"props":256,"children":258},{"className":257},[],[259],{"type":24,"value":185},{"type":24,"value":261}," 负责记录“这一次具体解析出了哪些版本”。",{"type":19,"tag":56,"props":263,"children":265},{"className":264},[],[266],{"type":24,"value":218},{"type":24,"value":268}," 里放的是当前项目实际运行时会用到的 Python 解释器和依赖包。",{"type":19,"tag":20,"props":270,"children":271},{},[272],{"type":24,"value":273},"所以运行脚本时，不要直接用系统 Python：",{"type":19,"tag":50,"props":275,"children":279},{"className":276,"code":277,"language":278,"meta":7,"style":7},"language-bash shiki shiki-themes github-dark","python scripts\u002Fvalidate_smiles.py --smiles \"CCO\"\n","bash",[280],{"type":19,"tag":56,"props":281,"children":282},{"__ignoreMap":7},[283],{"type":19,"tag":60,"props":284,"children":285},{"class":62,"line":63},[286,292,298,304],{"type":19,"tag":60,"props":287,"children":289},{"style":288},"--shiki-default:#B392F0",[290],{"type":24,"value":291},"python",{"type":19,"tag":60,"props":293,"children":295},{"style":294},"--shiki-default:#9ECBFF",[296],{"type":24,"value":297}," scripts\u002Fvalidate_smiles.py",{"type":19,"tag":60,"props":299,"children":301},{"style":300},"--shiki-default:#79B8FF",[302],{"type":24,"value":303}," --smiles",{"type":19,"tag":60,"props":305,"children":306},{"style":294},[307],{"type":24,"value":308}," \"CCO\"\n",{"type":19,"tag":20,"props":310,"children":311},{},[312,314,319],{"type":24,"value":313},"更推荐让 ",{"type":19,"tag":56,"props":315,"children":317},{"className":316},[],[318],{"type":24,"value":96},{"type":24,"value":320}," 使用项目自己的环境：",{"type":19,"tag":50,"props":322,"children":324},{"className":276,"code":323,"language":278,"meta":7,"style":7},"uv run python scripts\u002Fvalidate_smiles.py --smiles \"CCO\"\n",[325],{"type":19,"tag":56,"props":326,"children":327},{"__ignoreMap":7},[328],{"type":19,"tag":60,"props":329,"children":330},{"class":62,"line":63},[331,335,340,345,349,353],{"type":19,"tag":60,"props":332,"children":333},{"style":288},[334],{"type":24,"value":96},{"type":19,"tag":60,"props":336,"children":337},{"style":294},[338],{"type":24,"value":339}," run",{"type":19,"tag":60,"props":341,"children":342},{"style":294},[343],{"type":24,"value":344}," python",{"type":19,"tag":60,"props":346,"children":347},{"style":294},[348],{"type":24,"value":297},{"type":19,"tag":60,"props":350,"children":351},{"style":300},[352],{"type":24,"value":303},{"type":19,"tag":60,"props":354,"children":355},{"style":294},[356],{"type":24,"value":308},{"type":19,"tag":20,"props":358,"children":359},{},[360],{"type":24,"value":361},"这件事看起来和化学没关系，但它决定了别人拉下项目之后能不能复现你的结果。做 AI 制药平台时，依赖版本、运行环境和数据处理链路本来就是系统的一部分。",{"type":19,"tag":75,"props":363,"children":365},{"id":364},"smiles把分子写成字符串",[366],{"type":24,"value":367},"SMILES：把分子写成字符串",{"type":19,"tag":20,"props":369,"children":370},{},[371],{"type":24,"value":372},"SMILES 可以理解成“分子的字符串表示”。",{"type":19,"tag":20,"props":374,"children":375},{},[376],{"type":24,"value":377},"比如乙醇可以写成：",{"type":19,"tag":50,"props":379,"children":381},{"className":52,"code":380,"language":24,"meta":7,"style":7},"CCO\n",[382],{"type":19,"tag":56,"props":383,"children":384},{"__ignoreMap":7},[385],{"type":19,"tag":60,"props":386,"children":387},{"class":62,"line":63},[388],{"type":19,"tag":60,"props":389,"children":390},{},[391],{"type":24,"value":380},{"type":19,"tag":20,"props":393,"children":394},{},[395],{"type":24,"value":396},"这串字符里，每个符号都有化学含义：",{"type":19,"tag":116,"props":398,"children":399},{},[400,416],{"type":19,"tag":120,"props":401,"children":402},{},[403],{"type":19,"tag":124,"props":404,"children":405},{},[406,411],{"type":19,"tag":128,"props":407,"children":408},{},[409],{"type":24,"value":410},"符号",{"type":19,"tag":128,"props":412,"children":413},{},[414],{"type":24,"value":415},"含义",{"type":19,"tag":144,"props":417,"children":418},{},[419,436,453,470,487,504,521,534],{"type":19,"tag":124,"props":420,"children":421},{},[422,431],{"type":19,"tag":151,"props":423,"children":424},{},[425],{"type":19,"tag":56,"props":426,"children":428},{"className":427},[],[429],{"type":24,"value":430},"C",{"type":19,"tag":151,"props":432,"children":433},{},[434],{"type":24,"value":435},"碳原子",{"type":19,"tag":124,"props":437,"children":438},{},[439,448],{"type":19,"tag":151,"props":440,"children":441},{},[442],{"type":19,"tag":56,"props":443,"children":445},{"className":444},[],[446],{"type":24,"value":447},"O",{"type":19,"tag":151,"props":449,"children":450},{},[451],{"type":24,"value":452},"氧原子",{"type":19,"tag":124,"props":454,"children":455},{},[456,465],{"type":19,"tag":151,"props":457,"children":458},{},[459],{"type":19,"tag":56,"props":460,"children":462},{"className":461},[],[463],{"type":24,"value":464},"N",{"type":19,"tag":151,"props":466,"children":467},{},[468],{"type":24,"value":469},"氮原子",{"type":19,"tag":124,"props":471,"children":472},{},[473,482],{"type":19,"tag":151,"props":474,"children":475},{},[476],{"type":19,"tag":56,"props":477,"children":479},{"className":478},[],[480],{"type":24,"value":481},"=",{"type":19,"tag":151,"props":483,"children":484},{},[485],{"type":24,"value":486},"双键",{"type":19,"tag":124,"props":488,"children":489},{},[490,499],{"type":19,"tag":151,"props":491,"children":492},{},[493],{"type":19,"tag":56,"props":494,"children":496},{"className":495},[],[497],{"type":24,"value":498},"#",{"type":19,"tag":151,"props":500,"children":501},{},[502],{"type":24,"value":503},"三键",{"type":19,"tag":124,"props":505,"children":506},{},[507,516],{"type":19,"tag":151,"props":508,"children":509},{},[510],{"type":19,"tag":56,"props":511,"children":513},{"className":512},[],[514],{"type":24,"value":515},"()",{"type":19,"tag":151,"props":517,"children":518},{},[519],{"type":24,"value":520},"分支",{"type":19,"tag":124,"props":522,"children":523},{},[524,529],{"type":19,"tag":151,"props":525,"children":526},{},[527],{"type":24,"value":528},"数字",{"type":19,"tag":151,"props":530,"children":531},{},[532],{"type":24,"value":533},"环结构闭合",{"type":19,"tag":124,"props":535,"children":536},{},[537,548],{"type":19,"tag":151,"props":538,"children":539},{},[540,542],{"type":24,"value":541},"小写 ",{"type":19,"tag":56,"props":543,"children":545},{"className":544},[],[546],{"type":24,"value":547},"c",{"type":19,"tag":151,"props":549,"children":550},{},[551],{"type":24,"value":552},"芳香碳，比如苯环",{"type":19,"tag":20,"props":554,"children":555},{},[556],{"type":24,"value":557},"如果把前端作为类比，SMILES 有点像 HTML：HTML 用文本描述页面结构，SMILES 用文本描述分子结构。",{"type":19,"tag":20,"props":559,"children":560},{},[561],{"type":24,"value":562},"例如阿司匹林可以写成：",{"type":19,"tag":50,"props":564,"children":566},{"className":52,"code":565,"language":24,"meta":7,"style":7},"CC(=O)Oc1ccccc1C(=O)O\n",[567],{"type":19,"tag":56,"props":568,"children":569},{"__ignoreMap":7},[570],{"type":19,"tag":60,"props":571,"children":572},{"class":62,"line":63},[573],{"type":19,"tag":60,"props":574,"children":575},{},[576],{"type":24,"value":565},{"type":19,"tag":20,"props":578,"children":579},{},[580],{"type":24,"value":581},"这不是给人看的“化学名称”，而是给程序解析的结构描述。RDKit 就是负责把这种字符串变成可计算对象的工具。",{"type":19,"tag":75,"props":583,"children":585},{"id":584},"rdkit-负责什么",[586],{"type":24,"value":587},"RDKit 负责什么",{"type":19,"tag":20,"props":589,"children":590},{},[591],{"type":24,"value":592},"RDKit 不是模型，也不是分类器。",{"type":19,"tag":20,"props":594,"children":595},{},[596],{"type":24,"value":597},"它更像一个化学信息学工具箱，可以做这些事情：",{"type":19,"tag":50,"props":599,"children":601},{"className":52,"code":600,"language":24,"meta":7,"style":7},"SMILES -> Mol 对象\nMol 对象 -> 分子图片\nMol 对象 -> 分子描述符\nMol 对象 -> 分子指纹\nMol 对象 -> 标准化 SMILES\n",[602],{"type":19,"tag":56,"props":603,"children":604},{"__ignoreMap":7},[605,613,622,631,640],{"type":19,"tag":60,"props":606,"children":607},{"class":62,"line":63},[608],{"type":19,"tag":60,"props":609,"children":610},{},[611],{"type":24,"value":612},"SMILES -> Mol 对象\n",{"type":19,"tag":60,"props":614,"children":616},{"class":62,"line":615},2,[617],{"type":19,"tag":60,"props":618,"children":619},{},[620],{"type":24,"value":621},"Mol 对象 -> 分子图片\n",{"type":19,"tag":60,"props":623,"children":625},{"class":62,"line":624},3,[626],{"type":19,"tag":60,"props":627,"children":628},{},[629],{"type":24,"value":630},"Mol 对象 -> 分子描述符\n",{"type":19,"tag":60,"props":632,"children":634},{"class":62,"line":633},4,[635],{"type":19,"tag":60,"props":636,"children":637},{},[638],{"type":24,"value":639},"Mol 对象 -> 分子指纹\n",{"type":19,"tag":60,"props":641,"children":643},{"class":62,"line":642},5,[644],{"type":19,"tag":60,"props":645,"children":646},{},[647],{"type":24,"value":648},"Mol 对象 -> 标准化 SMILES\n",{"type":19,"tag":20,"props":650,"children":651},{},[652],{"type":24,"value":653},"在工程链路里，RDKit 通常位于“原始输入”和“机器学习特征”之间。",{"type":19,"tag":20,"props":655,"children":656},{},[657],{"type":24,"value":658},"用户输入的是 SMILES，模型需要的是向量。RDKit 负责把中间这段路铺出来。",{"type":19,"tag":75,"props":660,"children":662},{"id":661},"mol-对象rdkit-眼里的分子",[663],{"type":24,"value":664},"Mol 对象：RDKit 眼里的分子",{"type":19,"tag":20,"props":666,"children":667},{},[668,670,676],{"type":24,"value":669},"RDKit 解析 SMILES 之后，会得到一个 ",{"type":19,"tag":56,"props":671,"children":673},{"className":672},[],[674],{"type":24,"value":675},"Mol",{"type":24,"value":677}," 对象。",{"type":19,"tag":50,"props":679,"children":682},{"className":680,"code":681,"language":291,"meta":7,"style":7},"language-python shiki shiki-themes github-dark","from rdkit import Chem\n\nmol = Chem.MolFromSmiles(\"CCO\")\n\nif mol is None:\n    print(\"不合法 SMILES\")\nelse:\n    print(\"合法分子\")\n",[683],{"type":19,"tag":56,"props":684,"children":685},{"__ignoreMap":7},[686,694,703,711,718,726,735,744],{"type":19,"tag":60,"props":687,"children":688},{"class":62,"line":63},[689],{"type":19,"tag":60,"props":690,"children":691},{},[692],{"type":24,"value":693},"from rdkit import Chem\n",{"type":19,"tag":60,"props":695,"children":696},{"class":62,"line":615},[697],{"type":19,"tag":60,"props":698,"children":700},{"emptyLinePlaceholder":699},true,[701],{"type":24,"value":702},"\n",{"type":19,"tag":60,"props":704,"children":705},{"class":62,"line":624},[706],{"type":19,"tag":60,"props":707,"children":708},{},[709],{"type":24,"value":710},"mol = Chem.MolFromSmiles(\"CCO\")\n",{"type":19,"tag":60,"props":712,"children":713},{"class":62,"line":633},[714],{"type":19,"tag":60,"props":715,"children":716},{"emptyLinePlaceholder":699},[717],{"type":24,"value":702},{"type":19,"tag":60,"props":719,"children":720},{"class":62,"line":642},[721],{"type":19,"tag":60,"props":722,"children":723},{},[724],{"type":24,"value":725},"if mol is None:\n",{"type":19,"tag":60,"props":727,"children":729},{"class":62,"line":728},6,[730],{"type":19,"tag":60,"props":731,"children":732},{},[733],{"type":24,"value":734},"    print(\"不合法 SMILES\")\n",{"type":19,"tag":60,"props":736,"children":738},{"class":62,"line":737},7,[739],{"type":19,"tag":60,"props":740,"children":741},{},[742],{"type":24,"value":743},"else:\n",{"type":19,"tag":60,"props":745,"children":747},{"class":62,"line":746},8,[748],{"type":19,"tag":60,"props":749,"children":750},{},[751],{"type":24,"value":752},"    print(\"合法分子\")\n",{"type":19,"tag":20,"props":754,"children":755},{},[756,761],{"type":19,"tag":56,"props":757,"children":759},{"className":758},[],[760],{"type":24,"value":675},{"type":24,"value":762}," 对象可以理解成 RDKit 内部的分子结构图。它不只是保存一串文本，而是保存了原子、化学键、芳香性、价态等信息。",{"type":19,"tag":20,"props":764,"children":765},{},[766,768,774,776,782],{"type":24,"value":767},"如果 ",{"type":19,"tag":56,"props":769,"children":771},{"className":770},[],[772],{"type":24,"value":773},"Chem.MolFromSmiles()",{"type":24,"value":775}," 解析失败，会返回 ",{"type":19,"tag":56,"props":777,"children":779},{"className":778},[],[780],{"type":24,"value":781},"None",{"type":24,"value":783},"。所以只要系统入口允许用户输入 SMILES，就应该先做合法性检查。",{"type":19,"tag":20,"props":785,"children":786},{},[787],{"type":24,"value":788},"在工具库里，可以直接抛异常：",{"type":19,"tag":50,"props":790,"children":792},{"className":680,"code":791,"language":291,"meta":7,"style":7},"from rdkit import Chem\n\n\ndef parse_smiles(smiles: str):\n    mol = Chem.MolFromSmiles(smiles)\n    if mol is None:\n        raise ValueError(f\"无效的 SMILES: {smiles}\")\n    return mol\n",[793],{"type":19,"tag":56,"props":794,"children":795},{"__ignoreMap":7},[796,803,810,817,825,833,841,849],{"type":19,"tag":60,"props":797,"children":798},{"class":62,"line":63},[799],{"type":19,"tag":60,"props":800,"children":801},{},[802],{"type":24,"value":693},{"type":19,"tag":60,"props":804,"children":805},{"class":62,"line":615},[806],{"type":19,"tag":60,"props":807,"children":808},{"emptyLinePlaceholder":699},[809],{"type":24,"value":702},{"type":19,"tag":60,"props":811,"children":812},{"class":62,"line":624},[813],{"type":19,"tag":60,"props":814,"children":815},{"emptyLinePlaceholder":699},[816],{"type":24,"value":702},{"type":19,"tag":60,"props":818,"children":819},{"class":62,"line":633},[820],{"type":19,"tag":60,"props":821,"children":822},{},[823],{"type":24,"value":824},"def parse_smiles(smiles: str):\n",{"type":19,"tag":60,"props":826,"children":827},{"class":62,"line":642},[828],{"type":19,"tag":60,"props":829,"children":830},{},[831],{"type":24,"value":832},"    mol = Chem.MolFromSmiles(smiles)\n",{"type":19,"tag":60,"props":834,"children":835},{"class":62,"line":728},[836],{"type":19,"tag":60,"props":837,"children":838},{},[839],{"type":24,"value":840},"    if mol is None:\n",{"type":19,"tag":60,"props":842,"children":843},{"class":62,"line":737},[844],{"type":19,"tag":60,"props":845,"children":846},{},[847],{"type":24,"value":848},"        raise ValueError(f\"无效的 SMILES: {smiles}\")\n",{"type":19,"tag":60,"props":850,"children":851},{"class":62,"line":746},[852],{"type":19,"tag":60,"props":853,"children":854},{},[855],{"type":24,"value":856},"    return mol\n",{"type":19,"tag":20,"props":858,"children":859},{},[860],{"type":24,"value":861},"如果放到 Web API 里，就不应该把 Python 异常原样甩给前端，而是要转成稳定的业务错误。",{"type":19,"tag":20,"props":863,"children":864},{},[865],{"type":24,"value":866},"FastAPI 里可能是：",{"type":19,"tag":50,"props":868,"children":872},{"className":869,"code":870,"language":871,"meta":7,"style":7},"language-json shiki shiki-themes github-dark","{\n  \"code\": 400,\n  \"detail\": \"无效的 SMILES\"\n}\n","json",[873],{"type":19,"tag":56,"props":874,"children":875},{"__ignoreMap":7},[876,885,908,925],{"type":19,"tag":60,"props":877,"children":878},{"class":62,"line":63},[879],{"type":19,"tag":60,"props":880,"children":882},{"style":881},"--shiki-default:#E1E4E8",[883],{"type":24,"value":884},"{\n",{"type":19,"tag":60,"props":886,"children":887},{"class":62,"line":615},[888,893,898,903],{"type":19,"tag":60,"props":889,"children":890},{"style":300},[891],{"type":24,"value":892},"  \"code\"",{"type":19,"tag":60,"props":894,"children":895},{"style":881},[896],{"type":24,"value":897},": ",{"type":19,"tag":60,"props":899,"children":900},{"style":300},[901],{"type":24,"value":902},"400",{"type":19,"tag":60,"props":904,"children":905},{"style":881},[906],{"type":24,"value":907},",\n",{"type":19,"tag":60,"props":909,"children":910},{"class":62,"line":624},[911,916,920],{"type":19,"tag":60,"props":912,"children":913},{"style":300},[914],{"type":24,"value":915},"  \"detail\"",{"type":19,"tag":60,"props":917,"children":918},{"style":881},[919],{"type":24,"value":897},{"type":19,"tag":60,"props":921,"children":922},{"style":294},[923],{"type":24,"value":924},"\"无效的 SMILES\"\n",{"type":19,"tag":60,"props":926,"children":927},{"class":62,"line":633},[928],{"type":19,"tag":60,"props":929,"children":930},{"style":881},[931],{"type":24,"value":932},"}\n",{"type":19,"tag":20,"props":934,"children":935},{},[936],{"type":24,"value":937},"如果后端是 Spring Boot，也可以设计成更明确的错误结构：",{"type":19,"tag":50,"props":939,"children":941},{"className":869,"code":940,"language":871,"meta":7,"style":7},"{\n  \"code\": \"INVALID_SMILES\",\n  \"message\": \"您输入的分子式格式不正确\",\n  \"timestamp\": \"2026-05-18T10:00:00\"\n}\n",[942],{"type":19,"tag":56,"props":943,"children":944},{"__ignoreMap":7},[945,952,972,993,1010],{"type":19,"tag":60,"props":946,"children":947},{"class":62,"line":63},[948],{"type":19,"tag":60,"props":949,"children":950},{"style":881},[951],{"type":24,"value":884},{"type":19,"tag":60,"props":953,"children":954},{"class":62,"line":615},[955,959,963,968],{"type":19,"tag":60,"props":956,"children":957},{"style":300},[958],{"type":24,"value":892},{"type":19,"tag":60,"props":960,"children":961},{"style":881},[962],{"type":24,"value":897},{"type":19,"tag":60,"props":964,"children":965},{"style":294},[966],{"type":24,"value":967},"\"INVALID_SMILES\"",{"type":19,"tag":60,"props":969,"children":970},{"style":881},[971],{"type":24,"value":907},{"type":19,"tag":60,"props":973,"children":974},{"class":62,"line":624},[975,980,984,989],{"type":19,"tag":60,"props":976,"children":977},{"style":300},[978],{"type":24,"value":979},"  \"message\"",{"type":19,"tag":60,"props":981,"children":982},{"style":881},[983],{"type":24,"value":897},{"type":19,"tag":60,"props":985,"children":986},{"style":294},[987],{"type":24,"value":988},"\"您输入的分子式格式不正确\"",{"type":19,"tag":60,"props":990,"children":991},{"style":881},[992],{"type":24,"value":907},{"type":19,"tag":60,"props":994,"children":995},{"class":62,"line":633},[996,1001,1005],{"type":19,"tag":60,"props":997,"children":998},{"style":300},[999],{"type":24,"value":1000},"  \"timestamp\"",{"type":19,"tag":60,"props":1002,"children":1003},{"style":881},[1004],{"type":24,"value":897},{"type":19,"tag":60,"props":1006,"children":1007},{"style":294},[1008],{"type":24,"value":1009},"\"2026-05-18T10:00:00\"\n",{"type":19,"tag":60,"props":1011,"children":1012},{"class":62,"line":642},[1013],{"type":19,"tag":60,"props":1014,"children":1015},{"style":881},[1016],{"type":24,"value":932},{"type":19,"tag":20,"props":1018,"children":1019},{},[1020,1022,1027],{"type":24,"value":1021},"这就是从“写脚本”走向“做平台”时必须补上的一层：",{"type":19,"tag":37,"props":1023,"children":1024},{},[1025],{"type":24,"value":1026},"底层库负责发现错误，业务接口负责表达错误",{"type":24,"value":43},{"type":19,"tag":75,"props":1029,"children":1031},{"id":1030},"canonical-smiles统一同一个分子的不同写法",[1032],{"type":24,"value":1033},"Canonical SMILES：统一同一个分子的不同写法",{"type":19,"tag":20,"props":1035,"children":1036},{},[1037],{"type":24,"value":1038},"同一个分子可能有多种 SMILES 写法。",{"type":19,"tag":20,"props":1040,"children":1041},{},[1042],{"type":24,"value":377},{"type":19,"tag":50,"props":1044,"children":1046},{"className":52,"code":1045,"language":24,"meta":7,"style":7},"CCO\nOCC\n",[1047],{"type":19,"tag":56,"props":1048,"children":1049},{"__ignoreMap":7},[1050,1057],{"type":19,"tag":60,"props":1051,"children":1052},{"class":62,"line":63},[1053],{"type":19,"tag":60,"props":1054,"children":1055},{},[1056],{"type":24,"value":380},{"type":19,"tag":60,"props":1058,"children":1059},{"class":62,"line":615},[1060],{"type":19,"tag":60,"props":1061,"children":1062},{},[1063],{"type":24,"value":1064},"OCC\n",{"type":19,"tag":20,"props":1066,"children":1067},{},[1068],{"type":24,"value":1069},"对人来说都能看懂，但对系统来说，如果不做标准化，它们就是两条不同的字符串。这会影响去重、缓存、数据库索引和相似性搜索。",{"type":19,"tag":20,"props":1071,"children":1072},{},[1073],{"type":24,"value":1074},"RDKit 可以把不同写法统一成标准形式：",{"type":19,"tag":50,"props":1076,"children":1078},{"className":680,"code":1077,"language":291,"meta":7,"style":7},"from rdkit import Chem\n\n\ndef canonicalize_smiles(smiles: str) -> str:\n    mol = Chem.MolFromSmiles(smiles)\n    if mol is None:\n        raise ValueError(f\"无效的 SMILES: {smiles}\")\n    return Chem.MolToSmiles(mol, canonical=True)\n\n\nprint(canonicalize_smiles(\"OCC\"))\n",[1079],{"type":19,"tag":56,"props":1080,"children":1081},{"__ignoreMap":7},[1082,1089,1096,1103,1111,1118,1125,1132,1140,1148,1156],{"type":19,"tag":60,"props":1083,"children":1084},{"class":62,"line":63},[1085],{"type":19,"tag":60,"props":1086,"children":1087},{},[1088],{"type":24,"value":693},{"type":19,"tag":60,"props":1090,"children":1091},{"class":62,"line":615},[1092],{"type":19,"tag":60,"props":1093,"children":1094},{"emptyLinePlaceholder":699},[1095],{"type":24,"value":702},{"type":19,"tag":60,"props":1097,"children":1098},{"class":62,"line":624},[1099],{"type":19,"tag":60,"props":1100,"children":1101},{"emptyLinePlaceholder":699},[1102],{"type":24,"value":702},{"type":19,"tag":60,"props":1104,"children":1105},{"class":62,"line":633},[1106],{"type":19,"tag":60,"props":1107,"children":1108},{},[1109],{"type":24,"value":1110},"def canonicalize_smiles(smiles: str) -> str:\n",{"type":19,"tag":60,"props":1112,"children":1113},{"class":62,"line":642},[1114],{"type":19,"tag":60,"props":1115,"children":1116},{},[1117],{"type":24,"value":832},{"type":19,"tag":60,"props":1119,"children":1120},{"class":62,"line":728},[1121],{"type":19,"tag":60,"props":1122,"children":1123},{},[1124],{"type":24,"value":840},{"type":19,"tag":60,"props":1126,"children":1127},{"class":62,"line":737},[1128],{"type":19,"tag":60,"props":1129,"children":1130},{},[1131],{"type":24,"value":848},{"type":19,"tag":60,"props":1133,"children":1134},{"class":62,"line":746},[1135],{"type":19,"tag":60,"props":1136,"children":1137},{},[1138],{"type":24,"value":1139},"    return Chem.MolToSmiles(mol, canonical=True)\n",{"type":19,"tag":60,"props":1141,"children":1143},{"class":62,"line":1142},9,[1144],{"type":19,"tag":60,"props":1145,"children":1146},{"emptyLinePlaceholder":699},[1147],{"type":24,"value":702},{"type":19,"tag":60,"props":1149,"children":1151},{"class":62,"line":1150},10,[1152],{"type":19,"tag":60,"props":1153,"children":1154},{"emptyLinePlaceholder":699},[1155],{"type":24,"value":702},{"type":19,"tag":60,"props":1157,"children":1159},{"class":62,"line":1158},11,[1160],{"type":19,"tag":60,"props":1161,"children":1162},{},[1163],{"type":24,"value":1164},"print(canonicalize_smiles(\"OCC\"))\n",{"type":19,"tag":20,"props":1166,"children":1167},{},[1168],{"type":24,"value":1169},"在平台里，一个常见策略是：",{"type":19,"tag":50,"props":1171,"children":1173},{"className":52,"code":1172,"language":24,"meta":7,"style":7},"用户输入 SMILES -> RDKit 解析 -> Canonical SMILES -> 入库 \u002F 检索 \u002F 建模\n",[1174],{"type":19,"tag":56,"props":1175,"children":1176},{"__ignoreMap":7},[1177],{"type":19,"tag":60,"props":1178,"children":1179},{"class":62,"line":63},[1180],{"type":19,"tag":60,"props":1181,"children":1182},{},[1183],{"type":24,"value":1172},{"type":19,"tag":20,"props":1185,"children":1186},{},[1187],{"type":24,"value":1188},"这样用户输入的格式可以灵活，但系统内部使用统一表示。",{"type":19,"tag":75,"props":1190,"children":1192},{"id":1191},"生成分子图片让结构可视化",[1193],{"type":24,"value":1194},"生成分子图片：让结构可视化",{"type":19,"tag":20,"props":1196,"children":1197},{},[1198],{"type":24,"value":1199},"分子进入系统后，除了用于计算，也常常需要展示给用户。",{"type":19,"tag":20,"props":1201,"children":1202},{},[1203,1205,1210],{"type":24,"value":1204},"RDKit 可以直接把 ",{"type":19,"tag":56,"props":1206,"children":1208},{"className":1207},[],[1209],{"type":24,"value":675},{"type":24,"value":1211}," 对象画成图片：",{"type":19,"tag":50,"props":1213,"children":1215},{"className":680,"code":1214,"language":291,"meta":7,"style":7},"from pathlib import Path\n\nfrom PIL.Image import Image\nfrom rdkit import Chem, RDLogger\nfrom rdkit.Chem import Draw\n\nRDLogger.DisableLog(\"rdApp.*\")\n\nDEFAULT_SIZE = (300, 300)\n\n\ndef smiles_to_image(smiles: str, size: tuple[int, int] = DEFAULT_SIZE) -> Image:\n    mol = Chem.MolFromSmiles(smiles)\n    if mol is None:\n        raise ValueError(f\"无效的 SMILES: {smiles}\")\n\n    return Draw.MolToImage(mol, size=size)\n\n\ndef save_smiles_image(\n    smiles: str,\n    output_path: str | Path,\n    name: str = \"分子图片\",\n    size: tuple[int, int] = DEFAULT_SIZE,\n) -> Path:\n    path = Path(output_path)\n\n    if path.suffix == \"\":\n        path.mkdir(parents=True, exist_ok=True)\n        path = path \u002F f\"{name}.png\"\n    else:\n        path.parent.mkdir(parents=True, exist_ok=True)\n\n    image = smiles_to_image(smiles, size=size)\n    image.save(path)\n    return path\n\n\nif __name__ == \"__main__\":\n    save_smiles_image(\n        \"CC(=O)Oc1ccccc1C(=O)O\",\n        \"..\u002Foutputs\u002Fimages\u002F\",\n        name=\"阿司匹林\",\n    )\n",[1216],{"type":19,"tag":56,"props":1217,"children":1218},{"__ignoreMap":7},[1219,1227,1234,1242,1250,1258,1265,1273,1280,1288,1295,1302,1311,1319,1327,1335,1343,1352,1360,1368,1377,1386,1395,1404,1413,1422,1431,1439,1448,1457,1466,1475,1484,1492,1501,1510,1519,1527,1535,1544,1553,1562,1571,1580],{"type":19,"tag":60,"props":1220,"children":1221},{"class":62,"line":63},[1222],{"type":19,"tag":60,"props":1223,"children":1224},{},[1225],{"type":24,"value":1226},"from pathlib import Path\n",{"type":19,"tag":60,"props":1228,"children":1229},{"class":62,"line":615},[1230],{"type":19,"tag":60,"props":1231,"children":1232},{"emptyLinePlaceholder":699},[1233],{"type":24,"value":702},{"type":19,"tag":60,"props":1235,"children":1236},{"class":62,"line":624},[1237],{"type":19,"tag":60,"props":1238,"children":1239},{},[1240],{"type":24,"value":1241},"from PIL.Image import Image\n",{"type":19,"tag":60,"props":1243,"children":1244},{"class":62,"line":633},[1245],{"type":19,"tag":60,"props":1246,"children":1247},{},[1248],{"type":24,"value":1249},"from rdkit import Chem, RDLogger\n",{"type":19,"tag":60,"props":1251,"children":1252},{"class":62,"line":642},[1253],{"type":19,"tag":60,"props":1254,"children":1255},{},[1256],{"type":24,"value":1257},"from rdkit.Chem import Draw\n",{"type":19,"tag":60,"props":1259,"children":1260},{"class":62,"line":728},[1261],{"type":19,"tag":60,"props":1262,"children":1263},{"emptyLinePlaceholder":699},[1264],{"type":24,"value":702},{"type":19,"tag":60,"props":1266,"children":1267},{"class":62,"line":737},[1268],{"type":19,"tag":60,"props":1269,"children":1270},{},[1271],{"type":24,"value":1272},"RDLogger.DisableLog(\"rdApp.*\")\n",{"type":19,"tag":60,"props":1274,"children":1275},{"class":62,"line":746},[1276],{"type":19,"tag":60,"props":1277,"children":1278},{"emptyLinePlaceholder":699},[1279],{"type":24,"value":702},{"type":19,"tag":60,"props":1281,"children":1282},{"class":62,"line":1142},[1283],{"type":19,"tag":60,"props":1284,"children":1285},{},[1286],{"type":24,"value":1287},"DEFAULT_SIZE = (300, 300)\n",{"type":19,"tag":60,"props":1289,"children":1290},{"class":62,"line":1150},[1291],{"type":19,"tag":60,"props":1292,"children":1293},{"emptyLinePlaceholder":699},[1294],{"type":24,"value":702},{"type":19,"tag":60,"props":1296,"children":1297},{"class":62,"line":1158},[1298],{"type":19,"tag":60,"props":1299,"children":1300},{"emptyLinePlaceholder":699},[1301],{"type":24,"value":702},{"type":19,"tag":60,"props":1303,"children":1305},{"class":62,"line":1304},12,[1306],{"type":19,"tag":60,"props":1307,"children":1308},{},[1309],{"type":24,"value":1310},"def smiles_to_image(smiles: str, size: tuple[int, int] = DEFAULT_SIZE) -> Image:\n",{"type":19,"tag":60,"props":1312,"children":1314},{"class":62,"line":1313},13,[1315],{"type":19,"tag":60,"props":1316,"children":1317},{},[1318],{"type":24,"value":832},{"type":19,"tag":60,"props":1320,"children":1322},{"class":62,"line":1321},14,[1323],{"type":19,"tag":60,"props":1324,"children":1325},{},[1326],{"type":24,"value":840},{"type":19,"tag":60,"props":1328,"children":1330},{"class":62,"line":1329},15,[1331],{"type":19,"tag":60,"props":1332,"children":1333},{},[1334],{"type":24,"value":848},{"type":19,"tag":60,"props":1336,"children":1338},{"class":62,"line":1337},16,[1339],{"type":19,"tag":60,"props":1340,"children":1341},{"emptyLinePlaceholder":699},[1342],{"type":24,"value":702},{"type":19,"tag":60,"props":1344,"children":1346},{"class":62,"line":1345},17,[1347],{"type":19,"tag":60,"props":1348,"children":1349},{},[1350],{"type":24,"value":1351},"    return Draw.MolToImage(mol, size=size)\n",{"type":19,"tag":60,"props":1353,"children":1355},{"class":62,"line":1354},18,[1356],{"type":19,"tag":60,"props":1357,"children":1358},{"emptyLinePlaceholder":699},[1359],{"type":24,"value":702},{"type":19,"tag":60,"props":1361,"children":1363},{"class":62,"line":1362},19,[1364],{"type":19,"tag":60,"props":1365,"children":1366},{"emptyLinePlaceholder":699},[1367],{"type":24,"value":702},{"type":19,"tag":60,"props":1369,"children":1371},{"class":62,"line":1370},20,[1372],{"type":19,"tag":60,"props":1373,"children":1374},{},[1375],{"type":24,"value":1376},"def save_smiles_image(\n",{"type":19,"tag":60,"props":1378,"children":1380},{"class":62,"line":1379},21,[1381],{"type":19,"tag":60,"props":1382,"children":1383},{},[1384],{"type":24,"value":1385},"    smiles: str,\n",{"type":19,"tag":60,"props":1387,"children":1389},{"class":62,"line":1388},22,[1390],{"type":19,"tag":60,"props":1391,"children":1392},{},[1393],{"type":24,"value":1394},"    output_path: str | Path,\n",{"type":19,"tag":60,"props":1396,"children":1398},{"class":62,"line":1397},23,[1399],{"type":19,"tag":60,"props":1400,"children":1401},{},[1402],{"type":24,"value":1403},"    name: str = \"分子图片\",\n",{"type":19,"tag":60,"props":1405,"children":1407},{"class":62,"line":1406},24,[1408],{"type":19,"tag":60,"props":1409,"children":1410},{},[1411],{"type":24,"value":1412},"    size: tuple[int, int] = DEFAULT_SIZE,\n",{"type":19,"tag":60,"props":1414,"children":1416},{"class":62,"line":1415},25,[1417],{"type":19,"tag":60,"props":1418,"children":1419},{},[1420],{"type":24,"value":1421},") -> Path:\n",{"type":19,"tag":60,"props":1423,"children":1425},{"class":62,"line":1424},26,[1426],{"type":19,"tag":60,"props":1427,"children":1428},{},[1429],{"type":24,"value":1430},"    path = Path(output_path)\n",{"type":19,"tag":60,"props":1432,"children":1434},{"class":62,"line":1433},27,[1435],{"type":19,"tag":60,"props":1436,"children":1437},{"emptyLinePlaceholder":699},[1438],{"type":24,"value":702},{"type":19,"tag":60,"props":1440,"children":1442},{"class":62,"line":1441},28,[1443],{"type":19,"tag":60,"props":1444,"children":1445},{},[1446],{"type":24,"value":1447},"    if path.suffix == \"\":\n",{"type":19,"tag":60,"props":1449,"children":1451},{"class":62,"line":1450},29,[1452],{"type":19,"tag":60,"props":1453,"children":1454},{},[1455],{"type":24,"value":1456},"        path.mkdir(parents=True, exist_ok=True)\n",{"type":19,"tag":60,"props":1458,"children":1460},{"class":62,"line":1459},30,[1461],{"type":19,"tag":60,"props":1462,"children":1463},{},[1464],{"type":24,"value":1465},"        path = path \u002F f\"{name}.png\"\n",{"type":19,"tag":60,"props":1467,"children":1469},{"class":62,"line":1468},31,[1470],{"type":19,"tag":60,"props":1471,"children":1472},{},[1473],{"type":24,"value":1474},"    else:\n",{"type":19,"tag":60,"props":1476,"children":1478},{"class":62,"line":1477},32,[1479],{"type":19,"tag":60,"props":1480,"children":1481},{},[1482],{"type":24,"value":1483},"        path.parent.mkdir(parents=True, exist_ok=True)\n",{"type":19,"tag":60,"props":1485,"children":1487},{"class":62,"line":1486},33,[1488],{"type":19,"tag":60,"props":1489,"children":1490},{"emptyLinePlaceholder":699},[1491],{"type":24,"value":702},{"type":19,"tag":60,"props":1493,"children":1495},{"class":62,"line":1494},34,[1496],{"type":19,"tag":60,"props":1497,"children":1498},{},[1499],{"type":24,"value":1500},"    image = smiles_to_image(smiles, size=size)\n",{"type":19,"tag":60,"props":1502,"children":1504},{"class":62,"line":1503},35,[1505],{"type":19,"tag":60,"props":1506,"children":1507},{},[1508],{"type":24,"value":1509},"    image.save(path)\n",{"type":19,"tag":60,"props":1511,"children":1513},{"class":62,"line":1512},36,[1514],{"type":19,"tag":60,"props":1515,"children":1516},{},[1517],{"type":24,"value":1518},"    return path\n",{"type":19,"tag":60,"props":1520,"children":1522},{"class":62,"line":1521},37,[1523],{"type":19,"tag":60,"props":1524,"children":1525},{"emptyLinePlaceholder":699},[1526],{"type":24,"value":702},{"type":19,"tag":60,"props":1528,"children":1530},{"class":62,"line":1529},38,[1531],{"type":19,"tag":60,"props":1532,"children":1533},{"emptyLinePlaceholder":699},[1534],{"type":24,"value":702},{"type":19,"tag":60,"props":1536,"children":1538},{"class":62,"line":1537},39,[1539],{"type":19,"tag":60,"props":1540,"children":1541},{},[1542],{"type":24,"value":1543},"if __name__ == \"__main__\":\n",{"type":19,"tag":60,"props":1545,"children":1547},{"class":62,"line":1546},40,[1548],{"type":19,"tag":60,"props":1549,"children":1550},{},[1551],{"type":24,"value":1552},"    save_smiles_image(\n",{"type":19,"tag":60,"props":1554,"children":1556},{"class":62,"line":1555},41,[1557],{"type":19,"tag":60,"props":1558,"children":1559},{},[1560],{"type":24,"value":1561},"        \"CC(=O)Oc1ccccc1C(=O)O\",\n",{"type":19,"tag":60,"props":1563,"children":1565},{"class":62,"line":1564},42,[1566],{"type":19,"tag":60,"props":1567,"children":1568},{},[1569],{"type":24,"value":1570},"        \"..\u002Foutputs\u002Fimages\u002F\",\n",{"type":19,"tag":60,"props":1572,"children":1574},{"class":62,"line":1573},43,[1575],{"type":19,"tag":60,"props":1576,"children":1577},{},[1578],{"type":24,"value":1579},"        name=\"阿司匹林\",\n",{"type":19,"tag":60,"props":1581,"children":1583},{"class":62,"line":1582},44,[1584],{"type":19,"tag":60,"props":1585,"children":1586},{},[1587],{"type":24,"value":1588},"    )\n",{"type":19,"tag":20,"props":1590,"children":1591},{},[1592],{"type":24,"value":1593},"到这里，SMILES 已经不只是字符串了。它可以被验证、标准化，也可以被画出来。",{"type":19,"tag":20,"props":1595,"children":1596},{},[1597],{"type":24,"value":1598},"下一步是把它变成模型能用的数字。",{"type":19,"tag":75,"props":1600,"children":1602},{"id":1601},"分子描述符可解释的低维特征",[1603],{"type":24,"value":1604},"分子描述符：可解释的低维特征",{"type":19,"tag":20,"props":1606,"children":1607},{},[1608],{"type":24,"value":1609},"分子描述符是对一个分子的数值化描述。",{"type":19,"tag":20,"props":1611,"children":1612},{},[1613],{"type":24,"value":1614},"比如阿司匹林可以被描述成一组性质：",{"type":19,"tag":50,"props":1616,"children":1618},{"className":52,"code":1617,"language":24,"meta":7,"style":7},"分子量：180.16\nLogP：1.31\nTPSA：63.60\n氢键供体数量：1\n氢键受体数量：4\n可旋转键数量：2\n芳香环数量：1\n",[1619],{"type":19,"tag":56,"props":1620,"children":1621},{"__ignoreMap":7},[1622,1630,1638,1646,1654,1662,1670],{"type":19,"tag":60,"props":1623,"children":1624},{"class":62,"line":63},[1625],{"type":19,"tag":60,"props":1626,"children":1627},{},[1628],{"type":24,"value":1629},"分子量：180.16\n",{"type":19,"tag":60,"props":1631,"children":1632},{"class":62,"line":615},[1633],{"type":19,"tag":60,"props":1634,"children":1635},{},[1636],{"type":24,"value":1637},"LogP：1.31\n",{"type":19,"tag":60,"props":1639,"children":1640},{"class":62,"line":624},[1641],{"type":19,"tag":60,"props":1642,"children":1643},{},[1644],{"type":24,"value":1645},"TPSA：63.60\n",{"type":19,"tag":60,"props":1647,"children":1648},{"class":62,"line":633},[1649],{"type":19,"tag":60,"props":1650,"children":1651},{},[1652],{"type":24,"value":1653},"氢键供体数量：1\n",{"type":19,"tag":60,"props":1655,"children":1656},{"class":62,"line":642},[1657],{"type":19,"tag":60,"props":1658,"children":1659},{},[1660],{"type":24,"value":1661},"氢键受体数量：4\n",{"type":19,"tag":60,"props":1663,"children":1664},{"class":62,"line":728},[1665],{"type":19,"tag":60,"props":1666,"children":1667},{},[1668],{"type":24,"value":1669},"可旋转键数量：2\n",{"type":19,"tag":60,"props":1671,"children":1672},{"class":62,"line":737},[1673],{"type":19,"tag":60,"props":1674,"children":1675},{},[1676],{"type":24,"value":1677},"芳香环数量：1\n",{"type":19,"tag":20,"props":1679,"children":1680},{},[1681],{"type":24,"value":1682},"这些数值可以组成一个向量：",{"type":19,"tag":50,"props":1684,"children":1686},{"className":52,"code":1685,"language":24,"meta":7,"style":7},"[180.16, 1.31, 63.60, 1, 4, 2, 1]\n",[1687],{"type":19,"tag":56,"props":1688,"children":1689},{"__ignoreMap":7},[1690],{"type":19,"tag":60,"props":1691,"children":1692},{"class":62,"line":63},[1693],{"type":19,"tag":60,"props":1694,"children":1695},{},[1696],{"type":24,"value":1685},{"type":19,"tag":20,"props":1698,"children":1699},{},[1700],{"type":24,"value":1701},"模型并不理解“阿司匹林”这个名字，但它可以处理这样的数字向量。",{"type":19,"tag":20,"props":1703,"children":1704},{},[1705],{"type":24,"value":1706},"常见描述符包括：",{"type":19,"tag":116,"props":1708,"children":1709},{},[1710,1730],{"type":19,"tag":120,"props":1711,"children":1712},{},[1713],{"type":19,"tag":124,"props":1714,"children":1715},{},[1716,1721,1725],{"type":19,"tag":128,"props":1717,"children":1718},{},[1719],{"type":24,"value":1720},"描述符",{"type":19,"tag":128,"props":1722,"children":1723},{},[1724],{"type":24,"value":415},{"type":19,"tag":128,"props":1726,"children":1727},{},[1728],{"type":24,"value":1729},"直觉",{"type":19,"tag":144,"props":1731,"children":1732},{},[1733,1755,1777,1799,1837,1859,1881],{"type":19,"tag":124,"props":1734,"children":1735},{},[1736,1745,1750],{"type":19,"tag":151,"props":1737,"children":1738},{},[1739],{"type":19,"tag":56,"props":1740,"children":1742},{"className":1741},[],[1743],{"type":24,"value":1744},"MolWt",{"type":19,"tag":151,"props":1746,"children":1747},{},[1748],{"type":24,"value":1749},"分子量",{"type":19,"tag":151,"props":1751,"children":1752},{},[1753],{"type":24,"value":1754},"分子越大，吸收、代谢、穿膜等性质通常会受影响",{"type":19,"tag":124,"props":1756,"children":1757},{},[1758,1767,1772],{"type":19,"tag":151,"props":1759,"children":1760},{},[1761],{"type":19,"tag":56,"props":1762,"children":1764},{"className":1763},[],[1765],{"type":24,"value":1766},"LogP",{"type":19,"tag":151,"props":1768,"children":1769},{},[1770],{"type":24,"value":1771},"脂水分配系数",{"type":19,"tag":151,"props":1773,"children":1774},{},[1775],{"type":24,"value":1776},"越高越偏亲脂，越低越偏亲水",{"type":19,"tag":124,"props":1778,"children":1779},{},[1780,1789,1794],{"type":19,"tag":151,"props":1781,"children":1782},{},[1783],{"type":19,"tag":56,"props":1784,"children":1786},{"className":1785},[],[1787],{"type":24,"value":1788},"TPSA",{"type":19,"tag":151,"props":1790,"children":1791},{},[1792],{"type":24,"value":1793},"拓扑极性表面积",{"type":19,"tag":151,"props":1795,"children":1796},{},[1797],{"type":24,"value":1798},"越大通常越不容易穿过脂质膜或血脑屏障",{"type":19,"tag":124,"props":1800,"children":1801},{},[1802,1811,1816],{"type":19,"tag":151,"props":1803,"children":1804},{},[1805],{"type":19,"tag":56,"props":1806,"children":1808},{"className":1807},[],[1809],{"type":24,"value":1810},"HBD",{"type":19,"tag":151,"props":1812,"children":1813},{},[1814],{"type":24,"value":1815},"氢键供体数量",{"type":19,"tag":151,"props":1817,"children":1818},{},[1819,1821,1827,1829,1835],{"type":24,"value":1820},"比如 ",{"type":19,"tag":56,"props":1822,"children":1824},{"className":1823},[],[1825],{"type":24,"value":1826},"-OH",{"type":24,"value":1828},"、",{"type":19,"tag":56,"props":1830,"children":1832},{"className":1831},[],[1833],{"type":24,"value":1834},"-NH",{"type":24,"value":1836}," 这类能给出氢键的结构",{"type":19,"tag":124,"props":1838,"children":1839},{},[1840,1849,1854],{"type":19,"tag":151,"props":1841,"children":1842},{},[1843],{"type":19,"tag":56,"props":1844,"children":1846},{"className":1845},[],[1847],{"type":24,"value":1848},"HBA",{"type":19,"tag":151,"props":1850,"children":1851},{},[1852],{"type":24,"value":1853},"氢键受体数量",{"type":19,"tag":151,"props":1855,"children":1856},{},[1857],{"type":24,"value":1858},"比如 O、N 这类能接受氢键的原子",{"type":19,"tag":124,"props":1860,"children":1861},{},[1862,1871,1876],{"type":19,"tag":151,"props":1863,"children":1864},{},[1865],{"type":19,"tag":56,"props":1866,"children":1868},{"className":1867},[],[1869],{"type":24,"value":1870},"Rotatable Bonds",{"type":19,"tag":151,"props":1872,"children":1873},{},[1874],{"type":24,"value":1875},"可旋转键数量",{"type":19,"tag":151,"props":1877,"children":1878},{},[1879],{"type":24,"value":1880},"越多，分子越柔软，构象空间越大",{"type":19,"tag":124,"props":1882,"children":1883},{},[1884,1893,1898],{"type":19,"tag":151,"props":1885,"children":1886},{},[1887],{"type":19,"tag":56,"props":1888,"children":1890},{"className":1889},[],[1891],{"type":24,"value":1892},"Ring Count",{"type":19,"tag":151,"props":1894,"children":1895},{},[1896],{"type":24,"value":1897},"环数量",{"type":19,"tag":151,"props":1899,"children":1900},{},[1901],{"type":24,"value":1902},"环结构影响分子的刚性和结合方式",{"type":19,"tag":20,"props":1904,"children":1905},{},[1906],{"type":24,"value":1907},"用 RDKit 计算这些描述符：",{"type":19,"tag":50,"props":1909,"children":1911},{"className":680,"code":1910,"language":291,"meta":7,"style":7},"from rdkit import Chem\nfrom rdkit.Chem import Descriptors, Lipinski, rdMolDescriptors\n\n\ndef calc_basic_descriptors(smiles: str) -> dict[str, float | int]:\n    mol = Chem.MolFromSmiles(smiles)\n    if mol is None:\n        raise ValueError(f\"无效的 SMILES: {smiles}\")\n\n    return {\n        \"mol_wt\": round(Descriptors.MolWt(mol), 2),\n        \"logp\": round(Descriptors.MolLogP(mol), 2),\n        \"tpsa\": round(Descriptors.TPSA(mol), 2),\n        \"hbd\": Lipinski.NumHDonors(mol),\n        \"hba\": Lipinski.NumHAcceptors(mol),\n        \"rotatable_bonds\": Lipinski.NumRotatableBonds(mol),\n        \"ring_count\": rdMolDescriptors.CalcNumRings(mol),\n    }\n\n\nprint(calc_basic_descriptors(\"CC(=O)Oc1ccccc1C(=O)O\"))\n",[1912],{"type":19,"tag":56,"props":1913,"children":1914},{"__ignoreMap":7},[1915,1922,1930,1937,1944,1952,1959,1966,1973,1980,1988,1996,2004,2012,2020,2028,2036,2044,2052,2059,2066],{"type":19,"tag":60,"props":1916,"children":1917},{"class":62,"line":63},[1918],{"type":19,"tag":60,"props":1919,"children":1920},{},[1921],{"type":24,"value":693},{"type":19,"tag":60,"props":1923,"children":1924},{"class":62,"line":615},[1925],{"type":19,"tag":60,"props":1926,"children":1927},{},[1928],{"type":24,"value":1929},"from rdkit.Chem import Descriptors, Lipinski, rdMolDescriptors\n",{"type":19,"tag":60,"props":1931,"children":1932},{"class":62,"line":624},[1933],{"type":19,"tag":60,"props":1934,"children":1935},{"emptyLinePlaceholder":699},[1936],{"type":24,"value":702},{"type":19,"tag":60,"props":1938,"children":1939},{"class":62,"line":633},[1940],{"type":19,"tag":60,"props":1941,"children":1942},{"emptyLinePlaceholder":699},[1943],{"type":24,"value":702},{"type":19,"tag":60,"props":1945,"children":1946},{"class":62,"line":642},[1947],{"type":19,"tag":60,"props":1948,"children":1949},{},[1950],{"type":24,"value":1951},"def calc_basic_descriptors(smiles: str) -> dict[str, float | int]:\n",{"type":19,"tag":60,"props":1953,"children":1954},{"class":62,"line":728},[1955],{"type":19,"tag":60,"props":1956,"children":1957},{},[1958],{"type":24,"value":832},{"type":19,"tag":60,"props":1960,"children":1961},{"class":62,"line":737},[1962],{"type":19,"tag":60,"props":1963,"children":1964},{},[1965],{"type":24,"value":840},{"type":19,"tag":60,"props":1967,"children":1968},{"class":62,"line":746},[1969],{"type":19,"tag":60,"props":1970,"children":1971},{},[1972],{"type":24,"value":848},{"type":19,"tag":60,"props":1974,"children":1975},{"class":62,"line":1142},[1976],{"type":19,"tag":60,"props":1977,"children":1978},{"emptyLinePlaceholder":699},[1979],{"type":24,"value":702},{"type":19,"tag":60,"props":1981,"children":1982},{"class":62,"line":1150},[1983],{"type":19,"tag":60,"props":1984,"children":1985},{},[1986],{"type":24,"value":1987},"    return {\n",{"type":19,"tag":60,"props":1989,"children":1990},{"class":62,"line":1158},[1991],{"type":19,"tag":60,"props":1992,"children":1993},{},[1994],{"type":24,"value":1995},"        \"mol_wt\": round(Descriptors.MolWt(mol), 2),\n",{"type":19,"tag":60,"props":1997,"children":1998},{"class":62,"line":1304},[1999],{"type":19,"tag":60,"props":2000,"children":2001},{},[2002],{"type":24,"value":2003},"        \"logp\": round(Descriptors.MolLogP(mol), 2),\n",{"type":19,"tag":60,"props":2005,"children":2006},{"class":62,"line":1313},[2007],{"type":19,"tag":60,"props":2008,"children":2009},{},[2010],{"type":24,"value":2011},"        \"tpsa\": round(Descriptors.TPSA(mol), 2),\n",{"type":19,"tag":60,"props":2013,"children":2014},{"class":62,"line":1321},[2015],{"type":19,"tag":60,"props":2016,"children":2017},{},[2018],{"type":24,"value":2019},"        \"hbd\": Lipinski.NumHDonors(mol),\n",{"type":19,"tag":60,"props":2021,"children":2022},{"class":62,"line":1329},[2023],{"type":19,"tag":60,"props":2024,"children":2025},{},[2026],{"type":24,"value":2027},"        \"hba\": Lipinski.NumHAcceptors(mol),\n",{"type":19,"tag":60,"props":2029,"children":2030},{"class":62,"line":1337},[2031],{"type":19,"tag":60,"props":2032,"children":2033},{},[2034],{"type":24,"value":2035},"        \"rotatable_bonds\": Lipinski.NumRotatableBonds(mol),\n",{"type":19,"tag":60,"props":2037,"children":2038},{"class":62,"line":1345},[2039],{"type":19,"tag":60,"props":2040,"children":2041},{},[2042],{"type":24,"value":2043},"        \"ring_count\": rdMolDescriptors.CalcNumRings(mol),\n",{"type":19,"tag":60,"props":2045,"children":2046},{"class":62,"line":1354},[2047],{"type":19,"tag":60,"props":2048,"children":2049},{},[2050],{"type":24,"value":2051},"    }\n",{"type":19,"tag":60,"props":2053,"children":2054},{"class":62,"line":1362},[2055],{"type":19,"tag":60,"props":2056,"children":2057},{"emptyLinePlaceholder":699},[2058],{"type":24,"value":702},{"type":19,"tag":60,"props":2060,"children":2061},{"class":62,"line":1370},[2062],{"type":19,"tag":60,"props":2063,"children":2064},{"emptyLinePlaceholder":699},[2065],{"type":24,"value":702},{"type":19,"tag":60,"props":2067,"children":2068},{"class":62,"line":1379},[2069],{"type":19,"tag":60,"props":2070,"children":2071},{},[2072],{"type":24,"value":2073},"print(calc_basic_descriptors(\"CC(=O)Oc1ccccc1C(=O)O\"))\n",{"type":19,"tag":20,"props":2075,"children":2076},{},[2077],{"type":24,"value":2078},"描述符最大的优点是可解释。",{"type":19,"tag":20,"props":2080,"children":2081},{},[2082],{"type":24,"value":2083},"当模型预测一个分子溶解性不好时，我们至少可以回头看：是不是 LogP 太高？是不是 TPSA 太大？是不是氢键供受体数量异常？这些特征和化学直觉之间有对应关系。",{"type":19,"tag":20,"props":2085,"children":2086},{},[2087],{"type":24,"value":2088},"它的缺点也很明显：描述符是人工设计的汇总特征，表达能力有限。两个分子的分子量、LogP、TPSA 可能很接近，但局部结构完全不同。",{"type":19,"tag":20,"props":2090,"children":2091},{},[2092],{"type":24,"value":2093},"这就需要分子指纹。",{"type":19,"tag":75,"props":2095,"children":2097},{"id":2096},"lipinski-五规则一个经典但不能迷信的经验规则",[2098],{"type":24,"value":2099},"Lipinski 五规则：一个经典但不能迷信的经验规则",{"type":19,"tag":20,"props":2101,"children":2102},{},[2103],{"type":24,"value":2104},"在小分子药物里，经常会看到 Lipinski 五规则。它用于粗略判断一个分子是否可能具备较好的口服成药性。",{"type":19,"tag":20,"props":2106,"children":2107},{},[2108],{"type":24,"value":2109},"常见阈值是：",{"type":19,"tag":50,"props":2111,"children":2113},{"className":52,"code":2112,"language":24,"meta":7,"style":7},"分子量 \u003C= 500\nLogP \u003C= 5\n氢键供体 HBD \u003C= 5\n氢键受体 HBA \u003C= 10\n",[2114],{"type":19,"tag":56,"props":2115,"children":2116},{"__ignoreMap":7},[2117,2125,2133,2141],{"type":19,"tag":60,"props":2118,"children":2119},{"class":62,"line":63},[2120],{"type":19,"tag":60,"props":2121,"children":2122},{},[2123],{"type":24,"value":2124},"分子量 \u003C= 500\n",{"type":19,"tag":60,"props":2126,"children":2127},{"class":62,"line":615},[2128],{"type":19,"tag":60,"props":2129,"children":2130},{},[2131],{"type":24,"value":2132},"LogP \u003C= 5\n",{"type":19,"tag":60,"props":2134,"children":2135},{"class":62,"line":624},[2136],{"type":19,"tag":60,"props":2137,"children":2138},{},[2139],{"type":24,"value":2140},"氢键供体 HBD \u003C= 5\n",{"type":19,"tag":60,"props":2142,"children":2143},{"class":62,"line":633},[2144],{"type":19,"tag":60,"props":2145,"children":2146},{},[2147],{"type":24,"value":2148},"氢键受体 HBA \u003C= 10\n",{"type":19,"tag":20,"props":2150,"children":2151},{},[2152],{"type":24,"value":2153},"虽然叫“五规则”，但它不是一个模型，也不是药物能否成功的判决书。它更像早期过滤条件：如果一个分子严重违反这些规则，后续 ADMET 风险可能更高。",{"type":19,"tag":20,"props":2155,"children":2156},{},[2157],{"type":24,"value":2158},"在工程系统里，我更倾向于把它当成一个解释性指标，而不是硬性真理：",{"type":19,"tag":50,"props":2160,"children":2162},{"className":52,"code":2161,"language":24,"meta":7,"style":7},"这个分子违反了几条规则？\n违反的是分子量、LogP，还是氢键供受体？\n是否需要结合具体靶点、给药方式和实验数据重新判断？\n",[2163],{"type":19,"tag":56,"props":2164,"children":2165},{"__ignoreMap":7},[2166,2174,2182],{"type":19,"tag":60,"props":2167,"children":2168},{"class":62,"line":63},[2169],{"type":19,"tag":60,"props":2170,"children":2171},{},[2172],{"type":24,"value":2173},"这个分子违反了几条规则？\n",{"type":19,"tag":60,"props":2175,"children":2176},{"class":62,"line":615},[2177],{"type":19,"tag":60,"props":2178,"children":2179},{},[2180],{"type":24,"value":2181},"违反的是分子量、LogP，还是氢键供受体？\n",{"type":19,"tag":60,"props":2183,"children":2184},{"class":62,"line":624},[2185],{"type":19,"tag":60,"props":2186,"children":2187},{},[2188],{"type":24,"value":2189},"是否需要结合具体靶点、给药方式和实验数据重新判断？\n",{"type":19,"tag":20,"props":2191,"children":2192},{},[2193],{"type":24,"value":2194},"AI 制药里很容易把计算结果说得太满。但描述符和经验规则只能提供线索，不能替代实验验证。",{"type":19,"tag":75,"props":2196,"children":2198},{"id":2197},"morgan-指纹把局部结构编码成高维向量",[2199],{"type":24,"value":2200},"Morgan 指纹：把局部结构编码成高维向量",{"type":19,"tag":20,"props":2202,"children":2203},{},[2204],{"type":24,"value":2205},"分子描述符像是在回答：“这个分子总体上是什么性质？”",{"type":19,"tag":20,"props":2207,"children":2208},{},[2209],{"type":24,"value":2210},"Morgan 指纹更像是在回答：“这个分子里出现过哪些局部结构？”",{"type":19,"tag":20,"props":2212,"children":2213},{},[2214],{"type":24,"value":2215},"RDKit 里的 Morgan 指纹属于 circular fingerprints。它会围绕每个原子，在一定半径内观察局部化学环境，然后把这些局部环境编码到固定长度的向量里。",{"type":19,"tag":20,"props":2217,"children":2218},{},[2219],{"type":24,"value":2220},"一个常见设置是：",{"type":19,"tag":50,"props":2222,"children":2224},{"className":52,"code":2223,"language":24,"meta":7,"style":7},"radius = 2\nfpSize = 2048\n",[2225],{"type":19,"tag":56,"props":2226,"children":2227},{"__ignoreMap":7},[2228,2236],{"type":19,"tag":60,"props":2229,"children":2230},{"class":62,"line":63},[2231],{"type":19,"tag":60,"props":2232,"children":2233},{},[2234],{"type":24,"value":2235},"radius = 2\n",{"type":19,"tag":60,"props":2237,"children":2238},{"class":62,"line":615},[2239],{"type":19,"tag":60,"props":2240,"children":2241},{},[2242],{"type":24,"value":2243},"fpSize = 2048\n",{"type":19,"tag":20,"props":2245,"children":2246},{},[2247],{"type":24,"value":2248},"也就是生成一个 2048 位的 bit vector。每一位可以粗略理解为某种结构模式是否出现过。",{"type":19,"tag":20,"props":2250,"children":2251},{},[2252],{"type":24,"value":2253},"新版 RDKit 文档中，更推荐用 fingerprint generator 的方式生成 Morgan 指纹：",{"type":19,"tag":50,"props":2255,"children":2257},{"className":680,"code":2256,"language":291,"meta":7,"style":7},"import numpy as np\nfrom rdkit import Chem, DataStructs\nfrom rdkit.Chem import AllChem\n\n\ndef morgan_fingerprint(smiles: str, radius: int = 2, fp_size: int = 2048):\n    mol = Chem.MolFromSmiles(smiles)\n    if mol is None:\n        raise ValueError(f\"无效的 SMILES: {smiles}\")\n\n    fpgen = AllChem.GetMorganGenerator(radius=radius, fpSize=fp_size)\n    fingerprint = fpgen.GetFingerprint(mol)\n\n    array = np.zeros((fp_size,), dtype=np.int8)\n    DataStructs.ConvertToNumpyArray(fingerprint, array)\n    return array\n\n\nfp = morgan_fingerprint(\"CC(=O)Oc1ccccc1C(=O)O\")\nprint(fp.shape)\n",[2258],{"type":19,"tag":56,"props":2259,"children":2260},{"__ignoreMap":7},[2261,2269,2277,2285,2292,2299,2307,2314,2321,2328,2335,2343,2351,2358,2366,2374,2382,2389,2396,2404],{"type":19,"tag":60,"props":2262,"children":2263},{"class":62,"line":63},[2264],{"type":19,"tag":60,"props":2265,"children":2266},{},[2267],{"type":24,"value":2268},"import numpy as np\n",{"type":19,"tag":60,"props":2270,"children":2271},{"class":62,"line":615},[2272],{"type":19,"tag":60,"props":2273,"children":2274},{},[2275],{"type":24,"value":2276},"from rdkit import Chem, DataStructs\n",{"type":19,"tag":60,"props":2278,"children":2279},{"class":62,"line":624},[2280],{"type":19,"tag":60,"props":2281,"children":2282},{},[2283],{"type":24,"value":2284},"from rdkit.Chem import AllChem\n",{"type":19,"tag":60,"props":2286,"children":2287},{"class":62,"line":633},[2288],{"type":19,"tag":60,"props":2289,"children":2290},{"emptyLinePlaceholder":699},[2291],{"type":24,"value":702},{"type":19,"tag":60,"props":2293,"children":2294},{"class":62,"line":642},[2295],{"type":19,"tag":60,"props":2296,"children":2297},{"emptyLinePlaceholder":699},[2298],{"type":24,"value":702},{"type":19,"tag":60,"props":2300,"children":2301},{"class":62,"line":728},[2302],{"type":19,"tag":60,"props":2303,"children":2304},{},[2305],{"type":24,"value":2306},"def morgan_fingerprint(smiles: str, radius: int = 2, fp_size: int = 2048):\n",{"type":19,"tag":60,"props":2308,"children":2309},{"class":62,"line":737},[2310],{"type":19,"tag":60,"props":2311,"children":2312},{},[2313],{"type":24,"value":832},{"type":19,"tag":60,"props":2315,"children":2316},{"class":62,"line":746},[2317],{"type":19,"tag":60,"props":2318,"children":2319},{},[2320],{"type":24,"value":840},{"type":19,"tag":60,"props":2322,"children":2323},{"class":62,"line":1142},[2324],{"type":19,"tag":60,"props":2325,"children":2326},{},[2327],{"type":24,"value":848},{"type":19,"tag":60,"props":2329,"children":2330},{"class":62,"line":1150},[2331],{"type":19,"tag":60,"props":2332,"children":2333},{"emptyLinePlaceholder":699},[2334],{"type":24,"value":702},{"type":19,"tag":60,"props":2336,"children":2337},{"class":62,"line":1158},[2338],{"type":19,"tag":60,"props":2339,"children":2340},{},[2341],{"type":24,"value":2342},"    fpgen = AllChem.GetMorganGenerator(radius=radius, fpSize=fp_size)\n",{"type":19,"tag":60,"props":2344,"children":2345},{"class":62,"line":1304},[2346],{"type":19,"tag":60,"props":2347,"children":2348},{},[2349],{"type":24,"value":2350},"    fingerprint = fpgen.GetFingerprint(mol)\n",{"type":19,"tag":60,"props":2352,"children":2353},{"class":62,"line":1313},[2354],{"type":19,"tag":60,"props":2355,"children":2356},{"emptyLinePlaceholder":699},[2357],{"type":24,"value":702},{"type":19,"tag":60,"props":2359,"children":2360},{"class":62,"line":1321},[2361],{"type":19,"tag":60,"props":2362,"children":2363},{},[2364],{"type":24,"value":2365},"    array = np.zeros((fp_size,), dtype=np.int8)\n",{"type":19,"tag":60,"props":2367,"children":2368},{"class":62,"line":1329},[2369],{"type":19,"tag":60,"props":2370,"children":2371},{},[2372],{"type":24,"value":2373},"    DataStructs.ConvertToNumpyArray(fingerprint, array)\n",{"type":19,"tag":60,"props":2375,"children":2376},{"class":62,"line":1337},[2377],{"type":19,"tag":60,"props":2378,"children":2379},{},[2380],{"type":24,"value":2381},"    return array\n",{"type":19,"tag":60,"props":2383,"children":2384},{"class":62,"line":1345},[2385],{"type":19,"tag":60,"props":2386,"children":2387},{"emptyLinePlaceholder":699},[2388],{"type":24,"value":702},{"type":19,"tag":60,"props":2390,"children":2391},{"class":62,"line":1354},[2392],{"type":19,"tag":60,"props":2393,"children":2394},{"emptyLinePlaceholder":699},[2395],{"type":24,"value":702},{"type":19,"tag":60,"props":2397,"children":2398},{"class":62,"line":1362},[2399],{"type":19,"tag":60,"props":2400,"children":2401},{},[2402],{"type":24,"value":2403},"fp = morgan_fingerprint(\"CC(=O)Oc1ccccc1C(=O)O\")\n",{"type":19,"tag":60,"props":2405,"children":2406},{"class":62,"line":1370},[2407],{"type":19,"tag":60,"props":2408,"children":2409},{},[2410],{"type":24,"value":2411},"print(fp.shape)\n",{"type":19,"tag":20,"props":2413,"children":2414},{},[2415,2417,2423],{"type":24,"value":2416},"输出的 ",{"type":19,"tag":56,"props":2418,"children":2420},{"className":2419},[],[2421],{"type":24,"value":2422},"fp",{"type":24,"value":2424}," 就可以作为机器学习模型的输入特征。",{"type":19,"tag":20,"props":2426,"children":2427},{},[2428],{"type":24,"value":2429},"和描述符相比，Morgan 指纹的特点是：",{"type":19,"tag":116,"props":2431,"children":2432},{},[2433,2454],{"type":19,"tag":120,"props":2434,"children":2435},{},[2436],{"type":19,"tag":124,"props":2437,"children":2438},{},[2439,2444,2449],{"type":19,"tag":128,"props":2440,"children":2441},{},[2442],{"type":24,"value":2443},"特征",{"type":19,"tag":128,"props":2445,"children":2446},{},[2447],{"type":24,"value":2448},"分子描述符",{"type":19,"tag":128,"props":2450,"children":2451},{},[2452],{"type":24,"value":2453},"Morgan 指纹",{"type":19,"tag":144,"props":2455,"children":2456},{},[2457,2475,2493,2511],{"type":19,"tag":124,"props":2458,"children":2459},{},[2460,2465,2470],{"type":19,"tag":151,"props":2461,"children":2462},{},[2463],{"type":24,"value":2464},"维度",{"type":19,"tag":151,"props":2466,"children":2467},{},[2468],{"type":24,"value":2469},"通常较少",{"type":19,"tag":151,"props":2471,"children":2472},{},[2473],{"type":24,"value":2474},"通常较高，比如 2048 位",{"type":19,"tag":124,"props":2476,"children":2477},{},[2478,2483,2488],{"type":19,"tag":151,"props":2479,"children":2480},{},[2481],{"type":24,"value":2482},"可解释性",{"type":19,"tag":151,"props":2484,"children":2485},{},[2486],{"type":24,"value":2487},"强",{"type":19,"tag":151,"props":2489,"children":2490},{},[2491],{"type":24,"value":2492},"弱一些",{"type":19,"tag":124,"props":2494,"children":2495},{},[2496,2501,2506],{"type":19,"tag":151,"props":2497,"children":2498},{},[2499],{"type":24,"value":2500},"表达重点",{"type":19,"tag":151,"props":2502,"children":2503},{},[2504],{"type":24,"value":2505},"整体物化性质",{"type":19,"tag":151,"props":2507,"children":2508},{},[2509],{"type":24,"value":2510},"局部结构模式",{"type":19,"tag":124,"props":2512,"children":2513},{},[2514,2519,2524],{"type":19,"tag":151,"props":2515,"children":2516},{},[2517],{"type":24,"value":2518},"适合场景",{"type":19,"tag":151,"props":2520,"children":2521},{},[2522],{"type":24,"value":2523},"表格模型、baseline、解释分析",{"type":19,"tag":151,"props":2525,"children":2526},{},[2527],{"type":24,"value":2528},"相似性搜索、QSAR、机器学习输入",{"type":19,"tag":20,"props":2530,"children":2531},{},[2532],{"type":24,"value":2533},"真实项目里，两者不一定二选一。很多 baseline 会把描述符和指纹都算出来，再比较不同特征组合的效果。",{"type":19,"tag":75,"props":2535,"children":2537},{"id":2536},"tanimoto-相似度用指纹做分子相似性搜索",[2538],{"type":24,"value":2539},"Tanimoto 相似度：用指纹做分子相似性搜索",{"type":19,"tag":20,"props":2541,"children":2542},{},[2543],{"type":24,"value":2544},"有了分子指纹之后，就可以比较两个分子有多像。",{"type":19,"tag":20,"props":2546,"children":2547},{},[2548],{"type":24,"value":2549},"化学信息学里常用的指标之一是 Tanimoto similarity。对于 bit vector 来说，它可以粗略理解成：",{"type":19,"tag":50,"props":2551,"children":2553},{"className":52,"code":2552,"language":24,"meta":7,"style":7},"两个分子共同打开的 bit 数量 \u002F 两个分子总共打开的 bit 数量\n",[2554],{"type":19,"tag":56,"props":2555,"children":2556},{"__ignoreMap":7},[2557],{"type":19,"tag":60,"props":2558,"children":2559},{"class":62,"line":63},[2560],{"type":19,"tag":60,"props":2561,"children":2562},{},[2563],{"type":24,"value":2552},{"type":19,"tag":20,"props":2565,"children":2566},{},[2567],{"type":24,"value":2568},"值越接近 1，说明两个指纹越相似；越接近 0，说明差异越大。",{"type":19,"tag":20,"props":2570,"children":2571},{},[2572],{"type":24,"value":2573},"用 RDKit 计算两个分子的 Tanimoto 相似度：",{"type":19,"tag":50,"props":2575,"children":2577},{"className":680,"code":2576,"language":291,"meta":7,"style":7},"from rdkit import Chem, DataStructs\nfrom rdkit.Chem import AllChem\n\n\nfpgen = AllChem.GetMorganGenerator(radius=2, fpSize=2048)\n\n\ndef get_fp(smiles: str):\n    mol = Chem.MolFromSmiles(smiles)\n    if mol is None:\n        raise ValueError(f\"无效的 SMILES: {smiles}\")\n    return fpgen.GetFingerprint(mol)\n\n\naspirin = get_fp(\"CC(=O)Oc1ccccc1C(=O)O\")\nbenzoic_acid = get_fp(\"O=C(O)c1ccccc1\")\n\nsimilarity = DataStructs.TanimotoSimilarity(aspirin, benzoic_acid)\nprint(similarity)\n",[2578],{"type":19,"tag":56,"props":2579,"children":2580},{"__ignoreMap":7},[2581,2588,2595,2602,2609,2617,2624,2631,2639,2646,2653,2660,2668,2675,2682,2690,2698,2705,2713],{"type":19,"tag":60,"props":2582,"children":2583},{"class":62,"line":63},[2584],{"type":19,"tag":60,"props":2585,"children":2586},{},[2587],{"type":24,"value":2276},{"type":19,"tag":60,"props":2589,"children":2590},{"class":62,"line":615},[2591],{"type":19,"tag":60,"props":2592,"children":2593},{},[2594],{"type":24,"value":2284},{"type":19,"tag":60,"props":2596,"children":2597},{"class":62,"line":624},[2598],{"type":19,"tag":60,"props":2599,"children":2600},{"emptyLinePlaceholder":699},[2601],{"type":24,"value":702},{"type":19,"tag":60,"props":2603,"children":2604},{"class":62,"line":633},[2605],{"type":19,"tag":60,"props":2606,"children":2607},{"emptyLinePlaceholder":699},[2608],{"type":24,"value":702},{"type":19,"tag":60,"props":2610,"children":2611},{"class":62,"line":642},[2612],{"type":19,"tag":60,"props":2613,"children":2614},{},[2615],{"type":24,"value":2616},"fpgen = AllChem.GetMorganGenerator(radius=2, fpSize=2048)\n",{"type":19,"tag":60,"props":2618,"children":2619},{"class":62,"line":728},[2620],{"type":19,"tag":60,"props":2621,"children":2622},{"emptyLinePlaceholder":699},[2623],{"type":24,"value":702},{"type":19,"tag":60,"props":2625,"children":2626},{"class":62,"line":737},[2627],{"type":19,"tag":60,"props":2628,"children":2629},{"emptyLinePlaceholder":699},[2630],{"type":24,"value":702},{"type":19,"tag":60,"props":2632,"children":2633},{"class":62,"line":746},[2634],{"type":19,"tag":60,"props":2635,"children":2636},{},[2637],{"type":24,"value":2638},"def get_fp(smiles: str):\n",{"type":19,"tag":60,"props":2640,"children":2641},{"class":62,"line":1142},[2642],{"type":19,"tag":60,"props":2643,"children":2644},{},[2645],{"type":24,"value":832},{"type":19,"tag":60,"props":2647,"children":2648},{"class":62,"line":1150},[2649],{"type":19,"tag":60,"props":2650,"children":2651},{},[2652],{"type":24,"value":840},{"type":19,"tag":60,"props":2654,"children":2655},{"class":62,"line":1158},[2656],{"type":19,"tag":60,"props":2657,"children":2658},{},[2659],{"type":24,"value":848},{"type":19,"tag":60,"props":2661,"children":2662},{"class":62,"line":1304},[2663],{"type":19,"tag":60,"props":2664,"children":2665},{},[2666],{"type":24,"value":2667},"    return fpgen.GetFingerprint(mol)\n",{"type":19,"tag":60,"props":2669,"children":2670},{"class":62,"line":1313},[2671],{"type":19,"tag":60,"props":2672,"children":2673},{"emptyLinePlaceholder":699},[2674],{"type":24,"value":702},{"type":19,"tag":60,"props":2676,"children":2677},{"class":62,"line":1321},[2678],{"type":19,"tag":60,"props":2679,"children":2680},{"emptyLinePlaceholder":699},[2681],{"type":24,"value":702},{"type":19,"tag":60,"props":2683,"children":2684},{"class":62,"line":1329},[2685],{"type":19,"tag":60,"props":2686,"children":2687},{},[2688],{"type":24,"value":2689},"aspirin = get_fp(\"CC(=O)Oc1ccccc1C(=O)O\")\n",{"type":19,"tag":60,"props":2691,"children":2692},{"class":62,"line":1337},[2693],{"type":19,"tag":60,"props":2694,"children":2695},{},[2696],{"type":24,"value":2697},"benzoic_acid = get_fp(\"O=C(O)c1ccccc1\")\n",{"type":19,"tag":60,"props":2699,"children":2700},{"class":62,"line":1345},[2701],{"type":19,"tag":60,"props":2702,"children":2703},{"emptyLinePlaceholder":699},[2704],{"type":24,"value":702},{"type":19,"tag":60,"props":2706,"children":2707},{"class":62,"line":1354},[2708],{"type":19,"tag":60,"props":2709,"children":2710},{},[2711],{"type":24,"value":2712},"similarity = DataStructs.TanimotoSimilarity(aspirin, benzoic_acid)\n",{"type":19,"tag":60,"props":2714,"children":2715},{"class":62,"line":1362},[2716],{"type":19,"tag":60,"props":2717,"children":2718},{},[2719],{"type":24,"value":2720},"print(similarity)\n",{"type":19,"tag":20,"props":2722,"children":2723},{},[2724],{"type":24,"value":2725},"这就是分子相似性搜索的基础。",{"type":19,"tag":20,"props":2727,"children":2728},{},[2729],{"type":24,"value":2730},"一个简单的检索系统可以这样设计：",{"type":19,"tag":50,"props":2732,"children":2734},{"className":52,"code":2733,"language":24,"meta":7,"style":7},"1. 用户输入一个 SMILES\n2. RDKit 解析成 Mol\n3. 生成 Morgan 指纹\n4. 和数据库里已有分子的指纹逐个计算 Tanimoto similarity\n5. 返回相似度最高的 Top K 分子\n",[2735],{"type":19,"tag":56,"props":2736,"children":2737},{"__ignoreMap":7},[2738,2746,2754,2762,2770],{"type":19,"tag":60,"props":2739,"children":2740},{"class":62,"line":63},[2741],{"type":19,"tag":60,"props":2742,"children":2743},{},[2744],{"type":24,"value":2745},"1. 用户输入一个 SMILES\n",{"type":19,"tag":60,"props":2747,"children":2748},{"class":62,"line":615},[2749],{"type":19,"tag":60,"props":2750,"children":2751},{},[2752],{"type":24,"value":2753},"2. RDKit 解析成 Mol\n",{"type":19,"tag":60,"props":2755,"children":2756},{"class":62,"line":624},[2757],{"type":19,"tag":60,"props":2758,"children":2759},{},[2760],{"type":24,"value":2761},"3. 生成 Morgan 指纹\n",{"type":19,"tag":60,"props":2763,"children":2764},{"class":62,"line":633},[2765],{"type":19,"tag":60,"props":2766,"children":2767},{},[2768],{"type":24,"value":2769},"4. 和数据库里已有分子的指纹逐个计算 Tanimoto similarity\n",{"type":19,"tag":60,"props":2771,"children":2772},{"class":62,"line":642},[2773],{"type":19,"tag":60,"props":2774,"children":2775},{},[2776],{"type":24,"value":2777},"5. 返回相似度最高的 Top K 分子\n",{"type":19,"tag":20,"props":2779,"children":2780},{},[2781],{"type":24,"value":2782},"这类能力在药物发现里很常见。比如你已经知道一个 hit 分子，希望找到结构相似的候选分子；或者你想在分子库里快速找出一批和目标结构接近的化合物。",{"type":19,"tag":20,"props":2784,"children":2785},{},[2786,2788,2793],{"type":24,"value":2787},"但还是要加一句限制：",{"type":19,"tag":37,"props":2789,"children":2790},{},[2791],{"type":24,"value":2792},"结构相似不等于药效相同",{"type":24,"value":2794},"。相似性搜索只能提供候选方向，不能证明活性、毒性、选择性或成药性。",{"type":19,"tag":75,"props":2796,"children":2798},{"id":2797},"一条最小可用的分子特征流水线",[2799],{"type":24,"value":2797},{"type":19,"tag":20,"props":2801,"children":2802},{},[2803],{"type":24,"value":2804},"把前面的内容合起来，可以写出一条最小可用的分子特征流水线：",{"type":19,"tag":50,"props":2806,"children":2808},{"className":680,"code":2807,"language":291,"meta":7,"style":7},"import numpy as np\nfrom rdkit import Chem, DataStructs\nfrom rdkit.Chem import AllChem, Descriptors, Lipinski, rdMolDescriptors\n\n\nfpgen = AllChem.GetMorganGenerator(radius=2, fpSize=2048)\n\n\ndef featurize_smiles(smiles: str) -> dict[str, object]:\n    mol = Chem.MolFromSmiles(smiles)\n    if mol is None:\n        raise ValueError(f\"无效的 SMILES: {smiles}\")\n\n    canonical_smiles = Chem.MolToSmiles(mol, canonical=True)\n\n    descriptors = {\n        \"mol_wt\": round(Descriptors.MolWt(mol), 2),\n        \"logp\": round(Descriptors.MolLogP(mol), 2),\n        \"tpsa\": round(Descriptors.TPSA(mol), 2),\n        \"hbd\": Lipinski.NumHDonors(mol),\n        \"hba\": Lipinski.NumHAcceptors(mol),\n        \"rotatable_bonds\": Lipinski.NumRotatableBonds(mol),\n        \"ring_count\": rdMolDescriptors.CalcNumRings(mol),\n    }\n\n    fingerprint = fpgen.GetFingerprint(mol)\n    fp_array = np.zeros((2048,), dtype=np.int8)\n    DataStructs.ConvertToNumpyArray(fingerprint, fp_array)\n\n    return {\n        \"canonical_smiles\": canonical_smiles,\n        \"descriptors\": descriptors,\n        \"morgan_fingerprint\": fp_array,\n    }\n",[2809],{"type":19,"tag":56,"props":2810,"children":2811},{"__ignoreMap":7},[2812,2819,2826,2834,2841,2848,2855,2862,2869,2877,2884,2891,2898,2905,2913,2920,2928,2935,2942,2949,2956,2963,2970,2977,2984,2991,2998,3006,3014,3021,3028,3036,3044,3052],{"type":19,"tag":60,"props":2813,"children":2814},{"class":62,"line":63},[2815],{"type":19,"tag":60,"props":2816,"children":2817},{},[2818],{"type":24,"value":2268},{"type":19,"tag":60,"props":2820,"children":2821},{"class":62,"line":615},[2822],{"type":19,"tag":60,"props":2823,"children":2824},{},[2825],{"type":24,"value":2276},{"type":19,"tag":60,"props":2827,"children":2828},{"class":62,"line":624},[2829],{"type":19,"tag":60,"props":2830,"children":2831},{},[2832],{"type":24,"value":2833},"from rdkit.Chem import AllChem, Descriptors, Lipinski, rdMolDescriptors\n",{"type":19,"tag":60,"props":2835,"children":2836},{"class":62,"line":633},[2837],{"type":19,"tag":60,"props":2838,"children":2839},{"emptyLinePlaceholder":699},[2840],{"type":24,"value":702},{"type":19,"tag":60,"props":2842,"children":2843},{"class":62,"line":642},[2844],{"type":19,"tag":60,"props":2845,"children":2846},{"emptyLinePlaceholder":699},[2847],{"type":24,"value":702},{"type":19,"tag":60,"props":2849,"children":2850},{"class":62,"line":728},[2851],{"type":19,"tag":60,"props":2852,"children":2853},{},[2854],{"type":24,"value":2616},{"type":19,"tag":60,"props":2856,"children":2857},{"class":62,"line":737},[2858],{"type":19,"tag":60,"props":2859,"children":2860},{"emptyLinePlaceholder":699},[2861],{"type":24,"value":702},{"type":19,"tag":60,"props":2863,"children":2864},{"class":62,"line":746},[2865],{"type":19,"tag":60,"props":2866,"children":2867},{"emptyLinePlaceholder":699},[2868],{"type":24,"value":702},{"type":19,"tag":60,"props":2870,"children":2871},{"class":62,"line":1142},[2872],{"type":19,"tag":60,"props":2873,"children":2874},{},[2875],{"type":24,"value":2876},"def featurize_smiles(smiles: str) -> dict[str, object]:\n",{"type":19,"tag":60,"props":2878,"children":2879},{"class":62,"line":1150},[2880],{"type":19,"tag":60,"props":2881,"children":2882},{},[2883],{"type":24,"value":832},{"type":19,"tag":60,"props":2885,"children":2886},{"class":62,"line":1158},[2887],{"type":19,"tag":60,"props":2888,"children":2889},{},[2890],{"type":24,"value":840},{"type":19,"tag":60,"props":2892,"children":2893},{"class":62,"line":1304},[2894],{"type":19,"tag":60,"props":2895,"children":2896},{},[2897],{"type":24,"value":848},{"type":19,"tag":60,"props":2899,"children":2900},{"class":62,"line":1313},[2901],{"type":19,"tag":60,"props":2902,"children":2903},{"emptyLinePlaceholder":699},[2904],{"type":24,"value":702},{"type":19,"tag":60,"props":2906,"children":2907},{"class":62,"line":1321},[2908],{"type":19,"tag":60,"props":2909,"children":2910},{},[2911],{"type":24,"value":2912},"    canonical_smiles = Chem.MolToSmiles(mol, canonical=True)\n",{"type":19,"tag":60,"props":2914,"children":2915},{"class":62,"line":1329},[2916],{"type":19,"tag":60,"props":2917,"children":2918},{"emptyLinePlaceholder":699},[2919],{"type":24,"value":702},{"type":19,"tag":60,"props":2921,"children":2922},{"class":62,"line":1337},[2923],{"type":19,"tag":60,"props":2924,"children":2925},{},[2926],{"type":24,"value":2927},"    descriptors = {\n",{"type":19,"tag":60,"props":2929,"children":2930},{"class":62,"line":1345},[2931],{"type":19,"tag":60,"props":2932,"children":2933},{},[2934],{"type":24,"value":1995},{"type":19,"tag":60,"props":2936,"children":2937},{"class":62,"line":1354},[2938],{"type":19,"tag":60,"props":2939,"children":2940},{},[2941],{"type":24,"value":2003},{"type":19,"tag":60,"props":2943,"children":2944},{"class":62,"line":1362},[2945],{"type":19,"tag":60,"props":2946,"children":2947},{},[2948],{"type":24,"value":2011},{"type":19,"tag":60,"props":2950,"children":2951},{"class":62,"line":1370},[2952],{"type":19,"tag":60,"props":2953,"children":2954},{},[2955],{"type":24,"value":2019},{"type":19,"tag":60,"props":2957,"children":2958},{"class":62,"line":1379},[2959],{"type":19,"tag":60,"props":2960,"children":2961},{},[2962],{"type":24,"value":2027},{"type":19,"tag":60,"props":2964,"children":2965},{"class":62,"line":1388},[2966],{"type":19,"tag":60,"props":2967,"children":2968},{},[2969],{"type":24,"value":2035},{"type":19,"tag":60,"props":2971,"children":2972},{"class":62,"line":1397},[2973],{"type":19,"tag":60,"props":2974,"children":2975},{},[2976],{"type":24,"value":2043},{"type":19,"tag":60,"props":2978,"children":2979},{"class":62,"line":1406},[2980],{"type":19,"tag":60,"props":2981,"children":2982},{},[2983],{"type":24,"value":2051},{"type":19,"tag":60,"props":2985,"children":2986},{"class":62,"line":1415},[2987],{"type":19,"tag":60,"props":2988,"children":2989},{"emptyLinePlaceholder":699},[2990],{"type":24,"value":702},{"type":19,"tag":60,"props":2992,"children":2993},{"class":62,"line":1424},[2994],{"type":19,"tag":60,"props":2995,"children":2996},{},[2997],{"type":24,"value":2350},{"type":19,"tag":60,"props":2999,"children":3000},{"class":62,"line":1433},[3001],{"type":19,"tag":60,"props":3002,"children":3003},{},[3004],{"type":24,"value":3005},"    fp_array = np.zeros((2048,), dtype=np.int8)\n",{"type":19,"tag":60,"props":3007,"children":3008},{"class":62,"line":1441},[3009],{"type":19,"tag":60,"props":3010,"children":3011},{},[3012],{"type":24,"value":3013},"    DataStructs.ConvertToNumpyArray(fingerprint, fp_array)\n",{"type":19,"tag":60,"props":3015,"children":3016},{"class":62,"line":1450},[3017],{"type":19,"tag":60,"props":3018,"children":3019},{"emptyLinePlaceholder":699},[3020],{"type":24,"value":702},{"type":19,"tag":60,"props":3022,"children":3023},{"class":62,"line":1459},[3024],{"type":19,"tag":60,"props":3025,"children":3026},{},[3027],{"type":24,"value":1987},{"type":19,"tag":60,"props":3029,"children":3030},{"class":62,"line":1468},[3031],{"type":19,"tag":60,"props":3032,"children":3033},{},[3034],{"type":24,"value":3035},"        \"canonical_smiles\": canonical_smiles,\n",{"type":19,"tag":60,"props":3037,"children":3038},{"class":62,"line":1477},[3039],{"type":19,"tag":60,"props":3040,"children":3041},{},[3042],{"type":24,"value":3043},"        \"descriptors\": descriptors,\n",{"type":19,"tag":60,"props":3045,"children":3046},{"class":62,"line":1486},[3047],{"type":19,"tag":60,"props":3048,"children":3049},{},[3050],{"type":24,"value":3051},"        \"morgan_fingerprint\": fp_array,\n",{"type":19,"tag":60,"props":3053,"children":3054},{"class":62,"line":1494},[3055],{"type":19,"tag":60,"props":3056,"children":3057},{},[3058],{"type":24,"value":2051},{"type":19,"tag":20,"props":3060,"children":3061},{},[3062],{"type":24,"value":3063},"这段代码已经覆盖了一个小型 AI 制药平台的基础能力：",{"type":19,"tag":50,"props":3065,"children":3067},{"className":52,"code":3066,"language":24,"meta":7,"style":7},"输入校验\n标准化\n特征提取\n向量生成\n",[3068],{"type":19,"tag":56,"props":3069,"children":3070},{"__ignoreMap":7},[3071,3079,3087,3095],{"type":19,"tag":60,"props":3072,"children":3073},{"class":62,"line":63},[3074],{"type":19,"tag":60,"props":3075,"children":3076},{},[3077],{"type":24,"value":3078},"输入校验\n",{"type":19,"tag":60,"props":3080,"children":3081},{"class":62,"line":615},[3082],{"type":19,"tag":60,"props":3083,"children":3084},{},[3085],{"type":24,"value":3086},"标准化\n",{"type":19,"tag":60,"props":3088,"children":3089},{"class":62,"line":624},[3090],{"type":19,"tag":60,"props":3091,"children":3092},{},[3093],{"type":24,"value":3094},"特征提取\n",{"type":19,"tag":60,"props":3096,"children":3097},{"class":62,"line":633},[3098],{"type":19,"tag":60,"props":3099,"children":3100},{},[3101],{"type":24,"value":3102},"向量生成\n",{"type":19,"tag":20,"props":3104,"children":3105},{},[3106],{"type":24,"value":3107},"后面无论是做分子相似性搜索、性质预测、分类模型，还是接入更复杂的深度学习模型，这条链路都可以继续扩展。",{"type":19,"tag":75,"props":3109,"children":3111},{"id":3110},"我对这两周内容的理解",[3112],{"type":24,"value":3110},{"type":19,"tag":20,"props":3114,"children":3115},{},[3116],{"type":24,"value":3117},"学 RDKit 和 SMILES 时，很容易陷入 API 细节：这个函数怎么调，那个参数怎么写。",{"type":19,"tag":20,"props":3119,"children":3120},{},[3121],{"type":24,"value":3122},"但真正重要的是建立一张工程地图：",{"type":19,"tag":50,"props":3124,"children":3126},{"className":52,"code":3125,"language":24,"meta":7,"style":7},"SMILES 是输入格式\nMol 是 RDKit 的内部分子对象\nCanonical SMILES 用来统一表示\n分子图片用于展示\n分子描述符提供可解释特征\nMorgan 指纹提供高维结构特征\nTanimoto 相似度用于分子相似性搜索\n",[3127],{"type":19,"tag":56,"props":3128,"children":3129},{"__ignoreMap":7},[3130,3138,3146,3154,3162,3170,3178],{"type":19,"tag":60,"props":3131,"children":3132},{"class":62,"line":63},[3133],{"type":19,"tag":60,"props":3134,"children":3135},{},[3136],{"type":24,"value":3137},"SMILES 是输入格式\n",{"type":19,"tag":60,"props":3139,"children":3140},{"class":62,"line":615},[3141],{"type":19,"tag":60,"props":3142,"children":3143},{},[3144],{"type":24,"value":3145},"Mol 是 RDKit 的内部分子对象\n",{"type":19,"tag":60,"props":3147,"children":3148},{"class":62,"line":624},[3149],{"type":19,"tag":60,"props":3150,"children":3151},{},[3152],{"type":24,"value":3153},"Canonical SMILES 用来统一表示\n",{"type":19,"tag":60,"props":3155,"children":3156},{"class":62,"line":633},[3157],{"type":19,"tag":60,"props":3158,"children":3159},{},[3160],{"type":24,"value":3161},"分子图片用于展示\n",{"type":19,"tag":60,"props":3163,"children":3164},{"class":62,"line":642},[3165],{"type":19,"tag":60,"props":3166,"children":3167},{},[3168],{"type":24,"value":3169},"分子描述符提供可解释特征\n",{"type":19,"tag":60,"props":3171,"children":3172},{"class":62,"line":728},[3173],{"type":19,"tag":60,"props":3174,"children":3175},{},[3176],{"type":24,"value":3177},"Morgan 指纹提供高维结构特征\n",{"type":19,"tag":60,"props":3179,"children":3180},{"class":62,"line":737},[3181],{"type":19,"tag":60,"props":3182,"children":3183},{},[3184],{"type":24,"value":3185},"Tanimoto 相似度用于分子相似性搜索\n",{"type":19,"tag":20,"props":3187,"children":3188},{},[3189],{"type":24,"value":3190},"这张地图一旦建立起来，后面的机器学习部分会顺很多。",{"type":19,"tag":20,"props":3192,"children":3193},{},[3194],{"type":24,"value":3195},"因为你会知道：模型不是凭空预测的。它吃进去的每一个数字，都来自前面某一步对分子结构的编码。",{"type":19,"tag":20,"props":3197,"children":3198},{},[3199],{"type":24,"value":3200},"对 AI 制药项目来说，这也是一个很好的提醒：不要一上来就追复杂模型。先把分子表示、数据清洗、特征提取和错误处理做好，系统才有继续长大的基础。",{"type":19,"tag":75,"props":3202,"children":3204},{"id":3203},"参考资料",[3205],{"type":24,"value":3203},{"type":19,"tag":3207,"props":3208,"children":3209},"ul",{},[3210,3223,3233],{"type":19,"tag":3211,"props":3212,"children":3213},"li",{},[3214],{"type":19,"tag":3215,"props":3216,"children":3220},"a",{"href":3217,"rel":3218},"https:\u002F\u002Fwww.rdkit.org\u002Fdocs\u002Findex.html",[3219],"nofollow",[3221],{"type":24,"value":3222},"RDKit 2026.03.2 Documentation",{"type":19,"tag":3211,"props":3224,"children":3225},{},[3226],{"type":19,"tag":3215,"props":3227,"children":3230},{"href":3228,"rel":3229},"https:\u002F\u002Fwww.rdkit.org\u002Fdocs\u002FGettingStartedInPython.html",[3219],[3231],{"type":24,"value":3232},"Getting Started with the RDKit in Python",{"type":19,"tag":3211,"props":3234,"children":3235},{},[3236],{"type":19,"tag":3215,"props":3237,"children":3240},{"href":3238,"rel":3239},"https:\u002F\u002Fwww.rdkit.org\u002Fdocs\u002Fsource\u002Frdkit.Chem.Descriptors.html",[3219],[3241],{"type":24,"value":3242},"RDKit Descriptors 文档",{"type":19,"tag":3244,"props":3245,"children":3246},"style",{},[3247],{"type":24,"value":3248},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}",{"title":7,"searchDepth":615,"depth":615,"links":3250},[3251,3252,3253,3254,3255,3256,3257,3258,3259,3260,3261,3262,3263],{"id":77,"depth":615,"text":80},{"id":364,"depth":615,"text":367},{"id":584,"depth":615,"text":587},{"id":661,"depth":615,"text":664},{"id":1030,"depth":615,"text":1033},{"id":1191,"depth":615,"text":1194},{"id":1601,"depth":615,"text":1604},{"id":2096,"depth":615,"text":2099},{"id":2197,"depth":615,"text":2200},{"id":2536,"depth":615,"text":2539},{"id":2797,"depth":615,"text":2797},{"id":3110,"depth":615,"text":3110},{"id":3203,"depth":615,"text":3203},"markdown","content:articles:AI制药理论:rdkit-smiles-descriptors-fingerprints.md","content","articles\u002FAI制药理论\u002Frdkit-smiles-descriptors-fingerprints.md","articles\u002FAI制药理论\u002Frdkit-smiles-descriptors-fingerprints","md",1779811687794]