Header Ads Widget

LLM in Transition Metal Compounds

Last Posts

10/recent/ticker-posts

Post 1: From DFT to Data-Driven — Why I'm Adding ML to My Computational Toolkit

I've spent years doing first-principles calculations — Wien2k, VASP, Abinit, SIESTA, Quantum ESPRESSO, and a handful of other codes — on 3d transition metal chalcogenides (MS, MSe, MTe compounds, where M is a transition metal). Each calculation gives me a deep, accurate picture of one compound: its band structure, density of states, magnetic moment, ordering. But it's slow. A converged DFT+U calculation with proper magnetic configuration testing can take days, even on a good cluster.

Over the years, I've accumulated results for dozens of these compounds, scattered across papers, notebooks, and output files. And every time, I notice the same thing: the trends across compounds are often more interesting than any single result. Why does one MTe compound open a gap while a structurally similar MSe doesn't? Why does the magnetic ordering flip from antiferromagnetic to ferromagnetic when the M–X–M angle crosses some threshold? I know the qualitative physics — Goodenough-Kanamori-Anderson rules, crystal field theory, Zaanen-Sawatzky-Allen classification — but I've never had the systematic, quantitative cross-compound view.

This is where I think machine learning, and specifically physics-informed ML, can help.

What I mean by "physics-informed"

I'm not interested in black-box prediction. A model that says "band gap = 0.8 eV" with no explanation is useless to me as a researcher — I need to understand why. So my approach is to:

  • Encode the descriptors I already trust from DFT and theory (bond angles, crystal field splitting, d-electron count, electronegativity differences) as explicit features
  • Use these features to fit interpretable models — regression, decision rules, symbolic regression
  • Validate against my own DFT results and Materials Project data
  • End up with formulae or rules I can actually check against the physics

The goal isn't to replace DFT. It's to build a layer on top of it: a way to rapidly screen candidate compounds, generate hypotheses, and identify which structures are worth the computational cost of a full calculation.

Why now

Two things converged. First, the Materials Project now has structured data for hundreds of transition metal chalcogenides — band gaps, magnetic moments, formation energies, all queryable via API. That's a dataset I couldn't have built by hand in a reasonable time. Second, tools like PySR (symbolic regression) and SHAP (model interpretation) make it possible to go from "the model predicts X" to "the model predicts X because of descriptor Y" — which is the only kind of result I actually trust.

Application: octahedral crystal field splitting

As a first concrete descriptor — one every compound in the eventual dataset will carry — consider the crystal field splitting Δ in an octahedral environment. It scales roughly as Δ ∝ 1/(M–X bond length)⁵. Combined with the d-electron count, Δ determines the high-spin/low-spin state (via comparison with the pairing energy) and gives a rough first read on whether the compound trends toward a wide gap, a moderate gap, or a metallic state. Try it below — drag the sliders.

Octahedral crystal field splitting — interactive

Adjust the M–X bond length and d-electron count to see how Δ and the t₂g / eg occupation shift — a first read on the band gap regime.

2.45
5

Δ (crystal field)

– eV

Spin state

Est. gap regime

Illustrative scaling only — Δ ∝ 1/(M–X)⁵, high-spin/low-spin split estimated from Δ vs typical pairing energy. Real values come from DFT.

This is the kind of descriptor that will feed the dataset built in the next post: pulled from my own DFT outputs where available, extracted from papers via LLM parsing, and cross-checked against Materials Project entries.

What this series will document

This is a research notebook, written as I go. I expect false starts, dead ends, and revisions. The plan, roughly:

  • Build a unified dataset combining my own DFT results, literature data extracted from papers, and Materials Project entries
  • Engineer physics-informed descriptors — the same quantities I'd reason about manually, just made explicit and computable
  • Look for structure → electronic property relationships first (band gaps, metal-insulator transitions)
  • Then structure → magnetic property relationships (ordering type, moments, Néel temperatures)
  • Then try to capture the coupling between electronic and magnetic behavior
  • Use symbolic regression to search for formula forms, not just fit coefficients
  • End with a "rule bank" — a set of interpretable, physics-grounded predictive rules I can apply to compounds I haven't calculated yet

If it works, the payoff is concrete: faster screening of candidate MC compounds, better-targeted DFT calculations, and maybe a few rules worth publishing alongside the usual first-principles results.

Next post: building the dataset itself — what data I'm pulling from my own papers, what I'm getting from Materials Project, and the inevitable mess of reconciling the two.




Post a Comment

0 Comments