Class OxidationStateData
java.lang.Object
_global.tri.oxidationstates.fitting.OxidationStateData
This is the main class for data sets (e.g. testing, training data).
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionclassRepresents a single data point in the data set -
Constructor Summary
ConstructorsConstructorDescriptionOxidationStateData(String fileName, boolean removeNonInteger, double hullCutoff, boolean removeZeroOxidation, boolean removeZintl, String structDir) Read a data set from a given file and remove entries according to the given optionsOxidationStateData(String fileName, String structDir) Read a date set from a given fileOxidationStateData(Collection<OxidationStateData.Entry> entries, String structDir) Create a data set with the provided entries -
Method Summary
Modifier and TypeMethodDescriptionaddEntry(String structureID, String composition, IonFactory.Ion[] ions, String[] sources, double energyAboveHull, double gii) Add an entry to this data setcopy()Returns a copy of this data setvoiddataKeepOnlyIons(Set<IonFactory.Ion> allowedIons) Only keep entries for which all ions are in the given setReturns a map in which the keys are the ions contained in this data set and the values are the number of entries that contain the corresponding ion.getEntry(int entryNum) Returns the "entryNum"'th entry in this data set.Returns a map of oxidation states for each ion type in this data set.intGets the highest integer oxidation state in this data set, where any oxidation state within 0.01 of an integer is rounded to that integer.intGets the lowest integer oxidation state in this data set, where any oxidation state within 0.01 of an integer is rounded to that integer.Returns the directory with atomic structure files for the entriesgetUniqueEntries(boolean keepPolyIons) When entries with compositions written in terms of polyatomic ions are added to the data set, there will be two entries with the same ID: one with a composition written in terms of monatomic ions, and one with composition written in terms of polyatomic ions.intThe total number of entries in this data set.voidPrints to standard output the number of entries containing each ion in this data set.voidremoveEntries(OxidationStateData entriesToRemove) Removes the entries in the given set.voidRemoves all entries that contain oxidation states that are not within 0.01 of an integer.voidremoveEntriesWithNonIntegerStates(double tolerance) Removes all entries that contain oxidation states that are not within "tolerance" of an integer.voidRemoves all entries for which at least one of the ions has an oxidation state of zero.voidRemoves all entries with ZintlIons, as determined by theZintlIonFinder.voidremoveGIIDecrease(String structDirectory, String refStructDirectory, LikelihoodCalculator calculator) Removes all entries for which the GII in structDirectory is less than the GII in refStructDirectory.voidremoveNonChargeBalancedStructures(String structDirName) Removes all entries that do not have charge neutral structures, defined as structures for which all of the oxidation states of the atoms in each unit cell add up to zero.voidremoveRandomEntries(double percentToRemove) Removes a random subset of this data setvoidremoveStructuresNotNearHull(double energyAboveHull) Removes all structures with energy above hull greater than the provided value.voidremoveUncommonIonsByCount(int minAllowedCount) Removes all entries that contain a rate ions, where "rare" ions are those that appear in fewer than minAllowedCount entriesvoidremoveUncommonOxidationStates(double minAllowedFraction) Removes entries containing rare ions, where an ion is rare if the fraction of entries it appears in for its ion type is less than minAllowedFractionvoidremoveUnstableEntries(double maxEnergyAboveHull) Removes all structures with energy above hull greater than the provided value.voidsetEnergiesAboveHull(Map<String, Double> energiesByID) Sets the energies above the hull for entries in the given map.splitData(int numSplits) Randomly split the data into numSplits test sets.voidWrites a file containing this data setvoidWrites a file containing this data setvoidwriteLikelihoods(LikelihoodCalculator calculator, String fileName) Writes the calculated likelihood scores, along with information about the composition and ions, for all entries in this data set to the given file.
-
Constructor Details
-
OxidationStateData
Create a data set with the provided entries- Parameters:
entries- The entries to included in this data setstructDir- A directory that contains structure files for each in the entries, in VASP POSCAR format
-
OxidationStateData
Read a date set from a given file- Parameters:
fileName- The name of the given filestructDir- A directory that contains structure files for each in the entries, in VASP POSCAR format
-
OxidationStateData
public OxidationStateData(String fileName, boolean removeNonInteger, double hullCutoff, boolean removeZeroOxidation, boolean removeZintl, String structDir) Read a data set from a given file and remove entries according to the given options- Parameters:
fileName- The name of the given fileremoveNonInteger- Remove all entries that contain oxidation states with non-integer valueshullCutoff- An energy in eV / atom. All entries with energies above the convex hull above this value will be removed.removeZeroOxidation- Remove entries with oxidaiton states of zeroremoveZintl- Remove entries that contain Zintl ions, as determined by theZintlIonFinderstructDir- A directory that contains structure files for each in the entries, in VASP POSCAR format
-
-
Method Details
-
writeFile
Writes a file containing this data set- Parameters:
fileName- The name of the file to be written
-
writeFile
Writes a file containing this data set- Parameters:
writer- The file will be written to this writer- Throws:
IOException- if there is an I/O error
-
removeRandomEntries
public void removeRandomEntries(double percentToRemove) Removes a random subset of this data set- Parameters:
percentToRemove- The percent of entries to remove (rounded off).
-
copy
Returns a copy of this data set- Returns:
- a copy of this data set
-
removeEntries
Removes the entries in the given set. Note that the entry objects need to be exactly the same; i.e. both this data set and the "entriesToRemove" data set should be derived from the some data set.- Parameters:
entriesToRemove- A data set containing the entries to be removed. Note that the entry objects need to be exactly the same; i.e. both this data set and the "entriesToRemove" data set should be derived from the some data set.
-
dataKeepOnlyIons
Only keep entries for which all ions are in the given set- Parameters:
allowedIons- Entries will only be kept if all ions in the entry are in this set.
-
splitData
Randomly split the data into numSplits test sets. The union of all of the tests sets will be this complete data set, and all test sets will be approximately the same size. The split is done so that no composition will appear in more than one test set, so there is never the same composition in a test and training set.- Parameters:
numSplits- The number of test sets to generate- Returns:
- An array of generated test sets
-
getStructDir
Returns the directory with atomic structure files for the entries- Returns:
- the directory with atomic structure files for the entries
-
getCountsByIon
Returns a map in which the keys are the ions contained in this data set and the values are the number of entries that contain the corresponding ion.- Returns:
- a map in which the keys are the ions contained in this data set and the values are the number of entries that contain the corresponding ion.
-
removeUncommonIonsByCount
public void removeUncommonIonsByCount(int minAllowedCount) Removes all entries that contain a rate ions, where "rare" ions are those that appear in fewer than minAllowedCount entries- Parameters:
minAllowedCount- The minimum number of entries an ion must appear in to not be considered rare.
-
printNumEntriesByIon
public void printNumEntriesByIon()Prints to standard output the number of entries containing each ion in this data set. -
removeUncommonOxidationStates
public void removeUncommonOxidationStates(double minAllowedFraction) Removes entries containing rare ions, where an ion is rare if the fraction of entries it appears in for its ion type is less than minAllowedFraction- Parameters:
minAllowedFraction- An ion will be considered rare if the fraction of entries it appears in for its ion type is less than this value. For example, if A2+ appears in 10 entries and A3+ appears in 90 entries, then all entries containing A2+ will be removed if minAllowedFraction is less than 0.1.
-
getUniqueEntries
When entries with compositions written in terms of polyatomic ions are added to the data set, there will be two entries with the same ID: one with a composition written in terms of monatomic ions, and one with composition written in terms of polyatomic ions. This method removes of the two entries with the same ID. This method does not change the data set, but returns a map of the remaining entries keyed by entry ID.- Parameters:
keepPolyIons- If true, remove the entries with duplicate ID that have monatomic ions. If false, remove the entries with duplicate ID that have polyatomic ions.- Returns:
- A map of the remaining entries keyed by entry ID.
-
removeEntriesWithZeroOxidationStates
public void removeEntriesWithZeroOxidationStates()Removes all entries for which at least one of the ions has an oxidation state of zero. -
removeGIIDecrease
public void removeGIIDecrease(String structDirectory, String refStructDirectory, LikelihoodCalculator calculator) Removes all entries for which the GII in structDirectory is less than the GII in refStructDirectory. A tolerance of 1E-6 is used when comparing GIIs. TODO re-write this method so that just reads the GII from the entry (that field wasn't there when this was written).- Parameters:
structDirectory- A directory containing structures, where the description field gives the GII.refStructDirectory- A directory containing structures, where the description field gives the GII.calculator- A likelihood calculator used for logging purposes (tracking the likelihood score of the removed entries).
-
removeNonChargeBalancedStructures
Removes all entries that do not have charge neutral structures, defined as structures for which all of the oxidation states of the atoms in each unit cell add up to zero.- Parameters:
structDirName- The name of the directory containing the structure files in VASP POSCAR format.
-
removeEntriesWithZintlIons
public void removeEntriesWithZintlIons()Removes all entries with ZintlIons, as determined by theZintlIonFinder. -
getKnownOxidationStates
Returns a map of oxidation states for each ion type in this data set. The map is keyed by the ion type ID and the values are the oxidation states, in ascending order.- Returns:
- a map of oxidation states for each ion type in this data set. The map is keyed by the ion type ID and the values are the oxidation states, in ascending order.
-
removeEntriesWithNonIntegerStates
public void removeEntriesWithNonIntegerStates()Removes all entries that contain oxidation states that are not within 0.01 of an integer. -
removeEntriesWithNonIntegerStates
public void removeEntriesWithNonIntegerStates(double tolerance) Removes all entries that contain oxidation states that are not within "tolerance" of an integer.- Parameters:
tolerance- The maximum allowed difference between the oxidation state and an integer to be considered an integer oxidation state.
-
removeStructuresNotNearHull
public void removeStructuresNotNearHull(double energyAboveHull) Removes all structures with energy above hull greater than the provided value. If the energy above the hull is not defined, the entry is removed.- Parameters:
energyAboveHull- The minimum allowed energy above the hull, in eV / atom.
-
removeUnstableEntries
public void removeUnstableEntries(double maxEnergyAboveHull) Removes all structures with energy above hull greater than the provided value. If the energy above the hull is not defined, the entry is not removed.- Parameters:
maxEnergyAboveHull- The minimum allowed energy above the hull, in eV / atom.
-
getMinIntegerOxidationState
public int getMinIntegerOxidationState()Gets the lowest integer oxidation state in this data set, where any oxidation state within 0.01 of an integer is rounded to that integer.- Returns:
- the lowest integer oxidation state in this data set, where any oxidation state within 0.01 of an integer is rounded to that integer.
-
getMaxIntegerOxidationState
public int getMaxIntegerOxidationState()Gets the highest integer oxidation state in this data set, where any oxidation state within 0.01 of an integer is rounded to that integer.- Returns:
- the highest integer oxidation state in this data set, where any oxidation state within 0.01 of an integer is rounded to that integer.
-
numEntries
public int numEntries()The total number of entries in this data set.- Returns:
- the total number of entries in this data set.
-
getEntry
Returns the "entryNum"'th entry in this data set.- Parameters:
entryNum- The index of the entry to be returned.- Returns:
- the "entryNum"'th entry in this data set.
-
addEntry
public OxidationStateData.Entry addEntry(String structureID, String composition, IonFactory.Ion[] ions, String[] sources, double energyAboveHull, double gii) Add an entry to this data set- Parameters:
structureID- The ID for this entry. Entries do not need to have unique IDs in the case of monatomic / polyatomic compositions for the same structure, but if non-unique IDs are used in other contexts some functionality might not work as expected.composition- The composition for this entry.ions- The ions (including oxidation states) in this entry.sources- Where this entry came from. Multiple sources are allowed.energyAboveHull- The energy above the convex hull, in eV / atom. Double.NaN if unknown.gii- The global instability index for this entry. Double.NaN if unknown.- Returns:
- the entry that was added.
-
setEnergiesAboveHull
Sets the energies above the hull for entries in the given map. The energies are set for all entries with the given ID, even if multiple entries share the same ID.- Parameters:
energiesByID- A map in which the key is an entry ID, the value is the energy above the hull in eV / atom, and the key is the entry ID. The energies are set for all entries with the given ID, even if multiple entries share the same ID.
-
writeLikelihoods
Writes the calculated likelihood scores, along with information about the composition and ions, for all entries in this data set to the given file.- Parameters:
calculator- The calculator used to calculate the likelihood score.fileName- The name of the file to be written.
-