I/O File Format

This section describes the various input and output files used by the MC_fit program. Perple_X programs use a root name to identify problem-specific I/O files. The root name is specified when the user runs the program BUILD to create the problem definition file. The default root name assigned by BUILD is my_project, this name is assumed here. In addition to the problem definition file my_project.dat generated by BUILD, MC_fit requires a second problem specific file named my_project.imc that is created by the user to define the inversion problem.

Root and/or complete file names may include directory paths, e.g., the root ./my_directory/my_project causes output to be written to the subdirectory my_directory of the current working directory. To change the root name of a project rename my_project.dat and my_project.imc with the desired name.

Input: my_project.dat

It is assumed here that potential users of MC_fit are already familiar with the Perple_X program BUILD used to generate the my_project.dat problem definition file. Should this not be the case, refer to the tutorials listed on the Perple_X website.

To create a problem definition file suitable for MC_fit, run BUILD as though to setup the calculation of a phase diagram section for the observed assemblage:

  • The bulk composition and variable ranges specified in my_project.dat are ignored by MC_fit, i.e., arbitrary values for these parameters can be entered in response to the prompts from BUILD. MC_fit also ignores the molar vs mass input unit specification my_project.dat.

  • Specify the thermodynamic data, solution model, and Perple_X option files to be used by MC_fit. MC_fit reads an additional option file file that is specified in my_project.imc.

  • Specify the components that describe the chemistry of the observed assemblage, ordinarily these are specified as thermodynamic components; however saturated and mobile components may also be specified.

  • The independent phantom variables specified for the phantom phase diagram section calculation, ordinarily pressure and temperature, define the unknown environmental parameters to be determined by MC_fit. If mobile components have been specified, the chemical potential, activity, or fugacity of the component is included among the unknown environmental parameters to be determined by MC_fit.

  • Specify any solution models appropriate for the observed assemblage, as well for any phases that might affect the stability of the observed assemblage.

Note

The thermodynamic data, Perple_X option, and solution model files referenced in my_project.dat are not described here. Refer to the online documentation for details of the thermodynamic data and Perple_X option files. The solution model file, normally solution_model.dat, is to some extent self-documenting.

Note

Specifying component saturation constraints in my_project.dat simplifies the inversion problem by assuring that the corresponding composant is in equilibrium with the predicted assemblage. However, the identity of the composant is not considered as a constraint by MC_fit; therefore, component saturation constraints should not be used when the identity of the composant is critical.

For example, SiO2 should not be saturated if different silica polymorphs are potentially stable over the pressure- temperature range of interest and it is known that a particular polymorph was stable. Conversely, component saturation constraints should be used when it is suspected that a polymorph of a component was stable, but its identity is uncertain (e.g., aragonite vs calcite).

Input: my_project.imc

The general structure of the my_project.imc file consists of keywords with interspersed data as outlined below, line numbers have been added to facilitate discussion. Although the keywords and data must be entered sequentially, there is no requirement that the data must be on a particular line of the my_project.imc file.

 1  OPTION_FILE_NAME
 2
 3  begin_assemblage
 4
 5  sample_name       TEXT
 6
 7  pressure_range    LO#   HI#
 8  temperature_range LO#   HI#
 9
10  MAF(TEXT)_range   LO#   HI#
11  ...
12
13  unmeasured_component  TEXT
14
15  begin_limits
16  TEXT        LO#   HI#
17  ...
18  end_limits
19
20  ...
21
22  begin_bulk
23  TEXT        VAL#  UNC#
24  ...
25  end_bulk
26
27  phase_name  TEXT
28
29  phase_mode  VAL#  UNC#
30
31  begin_comp
32  TEXT        VAL#  UNC#
33  ...
34  end_comp
35
36  end_assemblage

Line 1: The complete name of the MC_fit option file, e.g., MC_fit_option.dat.

Line 3: begin_assemblage signals the beginning of the inversion problem.

Line 5, sample_name: TEXT is a sample identifier, title, etc.

Lines 7-8, pressure_range and temperature_range: The LO# HI# pairs on each line are the lower and upper limits on pressure (bar) and temperature (K) for the inversion problem. Currently there is no means of fixing pressure or temperature for an inversion problem. Such an option will be implemented upon request.

[Optional] Line 10, MAF(TEXT)_range: This keyword, and the relevant data, must be repeated for each mobile component specified in my_project.dat. If my_project.dat does not specify mobile components then this keyword must be omitted. TEXT is the name of a mobile component (e.g., O2) and the LO# HI# pair is the lower and upper limit on the chemical potential (J/mol), activity (non-dimensional), or fugacity (bar) of the component depending on which property has been chosen to represent the component in my_project.dat. The chemical potential/activity/fugacity of a mobile component is an independent inversion parameter, the amount of the mobile component in the observed phases is determined as a dependent parameter.

[Optional] Lines 13-18: This block specifies an unmeasured component, the abundance of which is to be determined by the inversion. The block is repeated as necessary to specify multiple unmeasured components. The TEXT following unmeasured_component keyword (Line 13) is the name of an unmeasured thermodynamic component (e.g., O2, C, CO2, H2O, S). The unmeasured_component keyword is followed by a block bounded by the begin_limits and end_limits keywords that specifies the range of the unmeasured component.

The range of unmeasured components may be specified in two ways, simple or coupled limits. Simple limits are appropriate when the abundance of an unmeasured component, e.g., H2O, is independent of the abundances of other components. Coupled limits are appropriate when the abundance of an unmeasured component, e.g., O2, is dependent on the abundances of other components.

Simple limits are signaled by “Simple” in place of TEXT. LO# and HI# specify the absolute lower- and upper-limits on the molar amount of the unmeasured component.

Note

Amounts specified for simple limits approximate mole fractions because of the normalization used by MC_fit,

In coupled limits, the range of the unmeasured component is specified in terms of other thermodynamic components. On these lines, TEXT specifies a thermodynamic component and LO# and HI# specifies the molar stoichiometric coefficients of the component in the linear equations for the lower- and upper-limits on the unmeasured component. Refer to the sidebars and/or input examples listed below for further information.

[Optional] Lines 20-24: In cases where it is reasonable to assume that the bulk composition of the putative equilibrium assemblage is known, this block specifies that composition. This assumption results in a major simplification of the inversion process. Thus, when justified, use of this option is recommended as the first line of attack on the inversion problem. However, particularly for minor elements and/or phases, insignificant variations in the bulk composition can have a significant effect on the predicted mineralogy due to the simplified nature of thermodynamic solution models. For this reason, inversions that make use of the bulk composition option that yield few or no acceptable solutions should be repeated without this option. The begin_bulk and end_bulk keywords (Lines 20 and 24) delimit the block. Between these keywords, each line specifies the name (TEXT) of a thermodynamic component, followed by its molar or mass amount (VAL#) and the relative or absolute uncertainty (UNC#).

The component names must match those specified in my_project.dat. The bulk composition need not be normalized and need not include the unmeasured thermodynamic components. The component amounts need not be normalized.

When uncertainties are not available, MC_fit can be instructed to estimate relative uncertainties using:

(6)\[\epsilon_{j} = \frac{0.25}{X_{j}^{0.7}}\]

where \(X_j\) is the molar fraction of component \(j\), the power-law exponent approximates the dependence suggested by Nerone et al. ([Nerone2025]), and the coefficient (0.25) is chosen to yield relative uncertainties of 2% when \(X_j = 0.5\) and 20% when \(X_j = 0.05\). The use of Eq 6 is signaled to MC_fit by negative number in place of UNC# for the relevant component. When Eq 6 is used, MC_fit automatically converts the component amounts specified by VAL# to the correct units and normalization.

Lines 27-34: This block specifies the observed phases of the putative relict equilibrium. The entries differ depending on whether the phase is modeled as a pure phase (e.g., quartz) or a solution phase (e.g., garnet).

For solution phases:

Line 27, phase_name: TEXT is the name of a solution model specified in my_project.dat that corresponds to an observed phase.

[Optional] Line 29, phase_mode: VAL# is the volume fraction of the phase, and UNC# is the uncertainty in that volume fraction. If modal fractions are provided, they must be provided for all phases. Modal fractions are automatically renormalized by MC_fit.

Note

Modal data and bulk composition are redundant information. Thus, nothing is gained by providing both. In cases where the bulk composition is unknown or inconsistent, modal data can be used to infer the bulk composition of the observed assemblage. Bulk compositions output by MC_fit without modal data are, in general, non-unique.

Lines 31-34: This block, bounded by the begin_comp and end_comp keywords, specifies the composition of the observed phase. Each line within the block specifies the name (NAME) of a thermodynamic, mobile, or saturated component, followed by its molar or mass amount (VAL#) and its relative or absolute uncertainty (UNC#).

Neither unmeasured components nor components that are not present in the observed phase need be specified. The component names must match those specified in my_project.dat. The component amounts need not be normalized.

As in the case of bulk compositions (Lines 20-24 above), when uncertainties are not available, MC_fit can be instructed to estimate relative uncertainty of a component with Eq 6 by setting UNC# to a negative number.

Note

Whether components that are observed in a phase, but that cannot be predicted by the corresponding solution model, should be included is a matter of judgment. Because such components cannot be predicted by the solution model, their inclusion has the consequence that they must be accommodated by other phases in the predicted assemblage. In general this leads to higher misfit and, in particular, may increase the misfit of the solution model that ultimately accommodates the component. These effects are proportional to the amount of the component. Arguably, when the uncertainty on the abundance of a component includes zero, it should be omitted. When the amount of a component that cannot be predicted by the corresponding solution model is significant, an alternative to simple omission or inclusion is to replace the component by a geochemical proxy that is predicted by the solution model. For example, BaO is a common component of feldspar, but cannot be predicted by current feldspar solution models. As CaO is a reasonable, and thermodynamically well-predicted, proxy for BaO in feldspar, the observed molar amount of CaO in an observed feldspar composition can be augmented by the molar amount of BaO, which is then omitted from the “observed” composition.

For pure phases:

Line 27, phase_name: TEXT is the name of a stoichiometric compound listed in the thermodynamic data file specified by my_project.dat that approximates the observed phase.

[Optional] Line 29, phase_mode: is specified as above for solution phases.

Lines 31-34: This block is omitted for pure phases.

Line 36, end_assemblage: signals the end of the inversion problem specification. Comments may be written after this keyword without using the comment character (|).

Output: my_project.out

The file my_project.out records the progress information and details of the Nelder-Mead optimizations during both the Central Model Analysis and Perturbation Analysis.

Central Model Output

By default, during the Central Model Analysis all successful tries, i.e., Nelder-Mead optimizations that converge to a local minimum, are output. The number of successful tries output can be regulated by the model_output, no_missing_phases, must_fit_all_data and misfit_filter_value options. Additionally, the Nelder-Mead_covariance option can be set to output the correlation-covariance matrix. The output of each successful try is of the form:

 1   --------------------------------------------------------------------------------
 2   Central model Try 44 converged, 42 successes so far.
 3
 4   Misfit function evaluations this try => 259
 5
 6   Misfit this try =  1.011330E-04
 7   Best Misfit so far = 1.011330E-04 obtained on Try 44
 8
 9   Prior probability, PP =  2.253960E-07
10   Bayes score, PP * exp(-Misfit) = 2.253732E-07
11   Best Bayes score so far = 4.084621E-07 obtained on Try 5
12
13                           P_bar         T_K           C_O2          Cpx           Gt
14   Initial coordinates:   3086.20       739.602      0.191001      0.825769      0.174231
15   Final coordinates:     10108.3       1002.02      0.141531      0.260092      0.739908
16
17   This model fits all data within observational uncertainty
18
19   Components of the Misfit Score
20
21        Predicted compositions:       1.011326E-04
22        Extraneous predicted phases:  4.134694E-10
23        Missed observed phases:   0.00000
24
25   The following observed phases are predicted:
26
27                    vol %     Na2O    MgO     Al2O3   CaO     FeO     O2      SiO2
28   Augite(G)
29     predicted*    26.177    0.0134  0.1392  0.0072  0.2164  0.1167  0.0063  0.5008
30     observed*               0.0135  0.1401  0.0073  0.2178  0.1174  0.0000  0.5039
31     residual                0.0001  0.0009  0.0001  0.0014  0.0007 -0.0063  0.0031
32     Composition Misfit:   0.396339E-04
33   Gt(W)
34     predicted*    73.823    0.0000  0.0518  0.1239  0.1561  0.2419  0.0078  0.4185
35     observed*               0.0000  0.0522  0.1249  0.1573  0.2438  0.0000  0.4218
36     residual                0.0000  0.0004  0.0010  0.0012  0.0019 -0.0078  0.0033
37     Composition Misfit:   0.614987E-04
38
39   The following predicted phases are not observed:
40
41                    vol %     Na2O    MgO     Al2O3   CaO     FeO     O2      SiO2
42   q                0.000    0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  1.0000
43
44                Effective Bulk*        Chemical Potentials (J/mol)
45   Na2O              3.507                     -777832.
46   MgO              75.016                     -641782.
47   Al2O3            94.301                    -0.165555E+07
48   CaO             173.054                     -745846.
49   FeO             210.941                     -320589.
50   O2                7.464                     -442446.
51   SiO2            443.182                     -891292.
52
53   *normalized molar units
54   --------------------------------------------------------------------------------

Lines 6-7: indicate the misfit of the current result and the current best result. The best_central_model can be identified by scrolling backward from the end of the my_project.out file to the last central model output to find the number of the try that produced the best result.

Lines 9-11: list the prior probability (Eq 15), the current Bayes score for the current result, and the best Bayes score so far.

Line 13: the inversion parameters. Parameters indicated by C_X are the fractional value of unmeasured component X, relative to its lower and upper limits as specified in the my_project.imc file. Because these limits are dependent on the effective bulk composition, which itself may be variable, the actual composition of an unmeasured component is best read from the phase or effective bulk compositions listed later in the output. For inversions in which the bulk compositional information is not input, parameters indicated by a phase name, indicate the fractional amount of the observed composition of the phase used to make up the bulk composition.

Line 14: the starting (the initial guess) coordinate of the Nelder-Mead optimization.

Line 15: the final (converged) coordinate of the Nelder-Mead optimization.

Line 17: indicates whether the predicted assemblage fits all data within its analytical uncertainty (uncertainty_multiplier). Whether models fit all data within uncertainty is not directly related to the misfit. Indeed, a model with low misfit may not fit all data within uncertainty, and vice versa. Consequently, it is a matter of judgment whether fitting all data within its uncertainty should be used as a criterion for accepting or rejecting models. This criterion is implemented at the user’s discretion in MC_fit_plot.

Lines 19-23: the components of the misfit (Eq 8).

Lines 28-38: the predicted and observed compositions, the predicted mode, and misfit, of each successfully predicted observed phase. If observed modal data is available, it is also listed. The predicted modes are only unique if modal and/or bulk compositional observational data has been provided. The phase compositions are normalized to 1 mole.

Lines 40-43: the predicted compositions and modes of any predicted phases that are not observed. If no_missing_phases is set to F, the observed composition and, if available, mode of each observed phase is that is not predicted is also listed.

Lines 45-52: the effective bulk composition and the chemical potential (J/mol) of each component. The chemical potentials are characteristic of the observed assemblage regardless of whether the bulk composition is unique; however, the effective bulk composition is only unique if modal and/or bulk compositional observational data have been provided. The effective bulk composition is normalized to 1000 mole. The composition can be pasted into my_project.dat and used as input for the Perple_X programs MEEMUM or VERTEX to verify that the predicted assemblage is stable at the inversion coordinates and to obtain additional information about the assemblage. The chemical potential of O2 (\(\mu_{\text{O2}}\)) is easily converted to oxygen fugacity (Appendix C).

Perturbation Analysis

The output for each successful try during the perturbation analysis is of the form:

1   --------------------------------------------------------------------------------
2   After 101 successful perturbations:
3                                         P_bar         T_K           C_O2          Cpx           Gt              OBJF
4   Central model parameters:            10108.3       1002.02      0.141531      0.260092      0.739908      1.011330E-04
5   Perturbed central model parameter:   9430.11       1001.87      0.134613      0.221303      0.778697      1.591129E-04
6   Mean perturbed parameters:           10370.6       1007.81      0.139926      0.242894      0.757106      1.636601E-04
7   Parameter standard deviation:        440.918       8.32835      4.849425E-03  3.580837E-02  3.580837E-02  8.408905E-05
8  --------------------------------------------------------------------------------

Line 2: the total number of perturbations is 1 + number_of_perturbations, because the best_central_model result is included in the perturbation analysis statistics. Results are reported only after two successful perturbations.

Line 3: the inversion parameters. The parameters have the same significance, and are listed in the same order, as in the central model output (line 13, above). By default, OBJF is the misfit. If Bayes is set to T, OBJF is the Bayes score.

Line 4: by default, repeats the best_central_model parameters. If perturbation analysis has been run with perturbation_hot_start, this line repeats the user specified central model parameters. These coordinates are the starting point for Nelder-Mead optimization with the perturbed analytical and thermodynamic parameters.

Line 5: the final coordinates of the Nelder-Mead optimization for the perturbed dataset.

Line 6: the mean of the final coordinates of all successful perturbation up to the current iteration.

Line 7: the standard deviation of the inversion parameters for all successful perturbations up to the current iteration.

Output: Console

The information written to the my_project.out is echoed to the user console with some additional progress information controlled by the vital_sign and print_level options.

Output: my_project_central.pts

The my_project_central.pts file contains the results of the central model analysis tries formatted for automated processing. Data in the file is of the form:

1   sym fit   P_bar         T_K           C_O2          Cpx           Gt                OBJF        Marker        Misfit         Bayes         Na2O          MgO           Al2O3         CaO           FeO           O2            SiO2
2     3  1    10014.4       999.954      0.164934      0.596088      0.403912      1.030612E-04   999.000      1.030612E-04  2.337938E-07  8.036604E-03  0.104553      5.480211E-02  0.193370      0.168471      6.946660E-03  0.470766
3     3  1    5666.01       912.995      0.214321      0.994931      5.068593E-03  2.037970E-03   999.000      2.037970E-03  3.602763E-07  1.341390E-02  0.139616      7.914999E-03  0.217487      0.118058      6.325581E-03  0.503510
4   ...
5     1  1    10108.3       1002.02      0.141531      0.260092      0.739908      1.011326E-04   999.000       0.00000       0.00000      1.348223E-02  0.140062      7.319146E-03  0.217793      0.117418      1.012418E-02  0.503926
6     9  0    1000.00       700.000       0.00000       0.00000       0.00000      1.011326E-04   999.000       0.00000       0.00000       0.00000       0.00000       0.00000       0.00000       0.00000       0.00000       0.00000
7    10  0    15000.0       1100.00       1.00000       1.00000       1.00000      1.011326E-04   999.000       0.00000       0.00000       0.00000       0.00000       0.00000       0.00000       0.00000       0.00000       0.00000

The header identifies each column of data. The first column is sym, an integer code with the following significance:

  • 1: best central model inversion coordinate.

  • 3: central model inversion coordinate for an inversion where Bayes has been set to default or F, in which case OBJF is the misfit of the model.

  • 6: central model inversion coordinate for an inversion where Bayes has been set to T, in which case OBJF is the Bayes score of the model.

  • 9: lower bounds for the inversion parameter ranges.

  • 10: upper bounds for the inversion parameter ranges.

sym is followed by:

  • fit, an integer code that is 1 if the model fits all data within observational uncertainty (uncertainty_multiplier), and 0 otherwise.

  • the names of the inversion parameters, as on line 13 of my_project.out.

  • OBJF, the misfit or Bayes score.

  • Marker, a numeric marker, 999.000, used to indicate the n+3’th column, where n is the number of inversion parameters.

  • Misfit, the misfit of the model.

  • Bayes, the Bayes score for the model.

  • the components that define the effective bulk composition of the model (molar units).

Note

  • The best central model inversion coordinate (sym = 1) replicates a coordinate (sym = 3 or 6) output earlier in the file.

  • When using programs other than MC_fit_plot for statistical analysis, care should be taken to ignore the best central model coordinate (sym = 1) and the lower and upper bound coordinates (sym = 9 and 10).

  • Symbol codes 3 and 6 are mutually exclusive.

Output: my_project_perturbed.pts

The my_project_perturbed.pts file contains the results of the perturbation analysis formatted for automated processing.

The format of the my_project_perturbed.pts file is identical to that of my_project_central.pts described above except that the mutually exclusive sym codes for the perturbed central models are 2 (misfit) and 5 (Bayes score).