java.lang.Object

_global.tri.oxidationstates.fitting.OxidationStateData

public class OxidationStateData extends Object

This is the main class for data sets (e.g. testing, training data).

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

class

OxidationStateData.Entry

Represents a single data point in the data set
Constructor Summary

Constructors

Constructor

Description

OxidationStateData(String fileName, boolean removeNonInteger, double hullCutoff, boolean removeZeroOxidation, boolean removeZintl, String structDir)

Read a data set from a given file and remove entries according to the given options

OxidationStateData(String fileName, String structDir)

Read a date set from a given file

OxidationStateData(Collection<OxidationStateData.Entry> entries, String structDir)

Create a data set with the provided entries
Method Summary

Modifier and Type

Method

Description

OxidationStateData.Entry

addEntry(String structureID, String composition, IonFactory.Ion[] ions, String[] sources, double energyAboveHull, double gii)

Add an entry to this data set

OxidationStateData

copy()

Returns a copy of this data set

void

dataKeepOnlyIons(Set<IonFactory.Ion> allowedIons)

Only keep entries for which all ions are in the given set

HashMap<IonFactory.Ion,Integer>

getCountsByIon()

Returns a map in which the keys are the ions contained in this data set and the values are the number of entries that contain the corresponding ion.

OxidationStateData.Entry

getEntry(int entryNum)

Returns the "entryNum"'th entry in this data set.

HashMap<String,int[]>

getKnownOxidationStates()

Returns a map of oxidation states for each ion type in this data set.

int

getMaxIntegerOxidationState()

Gets the highest integer oxidation state in this data set, where any oxidation state within 0.01 of an integer is rounded to that integer.

int

getMinIntegerOxidationState()

Gets the lowest integer oxidation state in this data set, where any oxidation state within 0.01 of an integer is rounded to that integer.

String

getStructDir()

Returns the directory with atomic structure files for the entries

HashMap<String,OxidationStateData.Entry>

getUniqueEntries(boolean keepPolyIons)

When entries with compositions written in terms of polyatomic ions are added to the data set, there will be two entries with the same ID: one with a composition written in terms of monatomic ions, and one with composition written in terms of polyatomic ions.

int

numEntries()

The total number of entries in this data set.

void

printNumEntriesByIon()

Prints to standard output the number of entries containing each ion in this data set.

void

removeEntries(OxidationStateData entriesToRemove)

Removes the entries in the given set.

void

removeEntriesWithNonIntegerStates()

Removes all entries that contain oxidation states that are not within 0.01 of an integer.

void

removeEntriesWithNonIntegerStates(double tolerance)

Removes all entries that contain oxidation states that are not within "tolerance" of an integer.

void

removeEntriesWithZeroOxidationStates()

Removes all entries for which at least one of the ions has an oxidation state of zero.

void

removeEntriesWithZintlIons()

Removes all entries with ZintlIons, as determined by the ZintlIonFinder.

void

removeGIIDecrease(String structDirectory, String refStructDirectory, LikelihoodCalculator calculator)

Removes all entries for which the GII in structDirectory is less than the GII in refStructDirectory.

void

removeNonChargeBalancedStructures(String structDirName)

Removes all entries that do not have charge neutral structures, defined as structures for which all of the oxidation states of the atoms in each unit cell add up to zero.

void

removeRandomEntries(double percentToRemove)

Removes a random subset of this data set

void

removeStructuresNotNearHull(double energyAboveHull)

Removes all structures with energy above hull greater than the provided value.

void

removeUncommonIonsByCount(int minAllowedCount)

Removes all entries that contain a rate ions, where "rare" ions are those that appear in fewer than minAllowedCount entries

void

removeUncommonOxidationStates(double minAllowedFraction)

Removes entries containing rare ions, where an ion is rare if the fraction of entries it appears in for its ion type is less than minAllowedFraction

void

removeUnstableEntries(double maxEnergyAboveHull)

Removes all structures with energy above hull greater than the provided value.

void

setEnergiesAboveHull(Map<String,Double> energiesByID)

Sets the energies above the hull for entries in the given map.

OxidationStateData[]

splitData(int numSplits)

Randomly split the data into numSplits test sets.

void

writeFile(Writer writer)

Writes a file containing this data set

void

writeFile(String fileName)

Writes a file containing this data set

void

writeLikelihoods(LikelihoodCalculator calculator, String fileName)

Writes the calculated likelihood scores, along with information about the composition and ions, for all entries in this data set to the given file.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- OxidationStateData
  
  public OxidationStateData(Collection<OxidationStateData.Entry> entries, String structDir)
  
  Create a data set with the provided entries
  
  Parameters:
  
  entries - The entries to included in this data set
  
  structDir - A directory that contains structure files for each in the entries, in VASP POSCAR format
- OxidationStateData
  
  public OxidationStateData(String fileName, String structDir)
  
  Read a date set from a given file
  
  Parameters:
  
  fileName - The name of the given file
  
  structDir - A directory that contains structure files for each in the entries, in VASP POSCAR format
- OxidationStateData
  
  public OxidationStateData(String fileName, boolean removeNonInteger, double hullCutoff, boolean removeZeroOxidation, boolean removeZintl, String structDir)
  
  Read a data set from a given file and remove entries according to the given options
  
  Parameters:
  
  fileName - The name of the given file
  
  removeNonInteger - Remove all entries that contain oxidation states with non-integer values
  
  hullCutoff - An energy in eV / atom. All entries with energies above the convex hull above this value will be removed.
  
  removeZeroOxidation - Remove entries with oxidaiton states of zero
  
  removeZintl - Remove entries that contain Zintl ions, as determined by the ZintlIonFinder
  
  structDir - A directory that contains structure files for each in the entries, in VASP POSCAR format
Method Details
- writeFile
  
  public void writeFile(String fileName)
  
  Writes a file containing this data set
  
  Parameters:
  
  fileName - The name of the file to be written
- writeFile
  
  public void writeFile(Writer writer) throws IOException
  
  Writes a file containing this data set
  
  Parameters:
  
  writer - The file will be written to this writer
  
  Throws:
  
  IOException - if there is an I/O error
- removeRandomEntries
  
  public void removeRandomEntries(double percentToRemove)
  
  Removes a random subset of this data set
  
  Parameters:
  
  percentToRemove - The percent of entries to remove (rounded off).
- copy
  
  public OxidationStateData copy()
  
  Returns a copy of this data set
  
  Returns:
  
  a copy of this data set
- removeEntries
  
  public void removeEntries(OxidationStateData entriesToRemove)
  
  Removes the entries in the given set. Note that the entry objects need to be exactly the same; i.e. both this data set and the "entriesToRemove" data set should be derived from the some data set.
  
  Parameters:
  
  entriesToRemove - A data set containing the entries to be removed. Note that the entry objects need to be exactly the same; i.e. both this data set and the "entriesToRemove" data set should be derived from the some data set.
- dataKeepOnlyIons
  
  public void dataKeepOnlyIons(Set<IonFactory.Ion> allowedIons)
  
  Only keep entries for which all ions are in the given set
  
  Parameters:
  
  allowedIons - Entries will only be kept if all ions in the entry are in this set.
- splitData
  
  public OxidationStateData[] splitData(int numSplits)
  
  Randomly split the data into numSplits test sets. The union of all of the tests sets will be this complete data set, and all test sets will be approximately the same size. The split is done so that no composition will appear in more than one test set, so there is never the same composition in a test and training set.
  
  Parameters:
  
  numSplits - The number of test sets to generate
  
  Returns:
  
  An array of generated test sets
- getStructDir
  
  public String getStructDir()
  
  Returns the directory with atomic structure files for the entries
  
  Returns:
  
  the directory with atomic structure files for the entries
- getCountsByIon
  
  public HashMap<IonFactory.Ion,Integer> getCountsByIon()
  
  Returns a map in which the keys are the ions contained in this data set and the values are the number of entries that contain the corresponding ion.
  
  Returns:
  
  a map in which the keys are the ions contained in this data set and the values are the number of entries that contain the corresponding ion.
- removeUncommonIonsByCount
  
  public void removeUncommonIonsByCount(int minAllowedCount)
  
  Removes all entries that contain a rate ions, where "rare" ions are those that appear in fewer than minAllowedCount entries
  
  Parameters:
  
  minAllowedCount - The minimum number of entries an ion must appear in to not be considered rare.
- printNumEntriesByIon
  
  public void printNumEntriesByIon()
  
  Prints to standard output the number of entries containing each ion in this data set.
- removeUncommonOxidationStates
  
  public void removeUncommonOxidationStates(double minAllowedFraction)
  
  Removes entries containing rare ions, where an ion is rare if the fraction of entries it appears in for its ion type is less than minAllowedFraction
  
  Parameters:
  
  minAllowedFraction - An ion will be considered rare if the fraction of entries it appears in for its ion type is less than this value. For example, if A2+ appears in 10 entries and A3+ appears in 90 entries, then all entries containing A2+ will be removed if minAllowedFraction is less than 0.1.
- getUniqueEntries
  
  public HashMap<String,OxidationStateData.Entry> getUniqueEntries(boolean keepPolyIons)
  
  When entries with compositions written in terms of polyatomic ions are added to the data set, there will be two entries with the same ID: one with a composition written in terms of monatomic ions, and one with composition written in terms of polyatomic ions. This method removes of the two entries with the same ID. This method does not change the data set, but returns a map of the remaining entries keyed by entry ID.
  
  Parameters:
  
  keepPolyIons - If true, remove the entries with duplicate ID that have monatomic ions. If false, remove the entries with duplicate ID that have polyatomic ions.
  
  Returns:
  
  A map of the remaining entries keyed by entry ID.
- removeEntriesWithZeroOxidationStates
  
  public void removeEntriesWithZeroOxidationStates()
  
  Removes all entries for which at least one of the ions has an oxidation state of zero.
- removeGIIDecrease
  
  public void removeGIIDecrease(String structDirectory, String refStructDirectory, LikelihoodCalculator calculator)
  
  Removes all entries for which the GII in structDirectory is less than the GII in refStructDirectory. A tolerance of 1E-6 is used when comparing GIIs. TODO re-write this method so that just reads the GII from the entry (that field wasn't there when this was written).
  
  Parameters:
  
  structDirectory - A directory containing structures, where the description field gives the GII.
  
  refStructDirectory - A directory containing structures, where the description field gives the GII.
  
  calculator - A likelihood calculator used for logging purposes (tracking the likelihood score of the removed entries).
- removeNonChargeBalancedStructures
  
  public void removeNonChargeBalancedStructures(String structDirName)
  
  Removes all entries that do not have charge neutral structures, defined as structures for which all of the oxidation states of the atoms in each unit cell add up to zero.
  
  Parameters:
  
  structDirName - The name of the directory containing the structure files in VASP POSCAR format.
- removeEntriesWithZintlIons
  
  public void removeEntriesWithZintlIons()
  
  Removes all entries with ZintlIons, as determined by the ZintlIonFinder.
- getKnownOxidationStates
  
  public HashMap<String,int[]> getKnownOxidationStates()
  
  Returns a map of oxidation states for each ion type in this data set. The map is keyed by the ion type ID and the values are the oxidation states, in ascending order.
  
  Returns:
  
  a map of oxidation states for each ion type in this data set. The map is keyed by the ion type ID and the values are the oxidation states, in ascending order.
- removeEntriesWithNonIntegerStates
  
  public void removeEntriesWithNonIntegerStates()
  
  Removes all entries that contain oxidation states that are not within 0.01 of an integer.
- removeEntriesWithNonIntegerStates
  
  public void removeEntriesWithNonIntegerStates(double tolerance)
  
  Removes all entries that contain oxidation states that are not within "tolerance" of an integer.
  
  Parameters:
  
  tolerance - The maximum allowed difference between the oxidation state and an integer to be considered an integer oxidation state.
- removeStructuresNotNearHull
  
  public void removeStructuresNotNearHull(double energyAboveHull)
  
  Removes all structures with energy above hull greater than the provided value. If the energy above the hull is not defined, the entry is removed.
  
  Parameters:
  
  energyAboveHull - The minimum allowed energy above the hull, in eV / atom.
- removeUnstableEntries
  
  public void removeUnstableEntries(double maxEnergyAboveHull)
  
  Removes all structures with energy above hull greater than the provided value. If the energy above the hull is not defined, the entry is not removed.
  
  Parameters:
  
  maxEnergyAboveHull - The minimum allowed energy above the hull, in eV / atom.
- getMinIntegerOxidationState
  
  public int getMinIntegerOxidationState()
  
  Gets the lowest integer oxidation state in this data set, where any oxidation state within 0.01 of an integer is rounded to that integer.
  
  Returns:
  
  the lowest integer oxidation state in this data set, where any oxidation state within 0.01 of an integer is rounded to that integer.
- getMaxIntegerOxidationState
  
  public int getMaxIntegerOxidationState()
  
  Gets the highest integer oxidation state in this data set, where any oxidation state within 0.01 of an integer is rounded to that integer.
  
  Returns:
  
  the highest integer oxidation state in this data set, where any oxidation state within 0.01 of an integer is rounded to that integer.
- numEntries
  
  public int numEntries()
  
  The total number of entries in this data set.
  
  Returns:
  
  the total number of entries in this data set.
- getEntry
  
  public OxidationStateData.Entry getEntry(int entryNum)
  
  Returns the "entryNum"'th entry in this data set.
  
  Parameters:
  
  entryNum - The index of the entry to be returned.
  
  Returns:
  
  the "entryNum"'th entry in this data set.
- addEntry
  
  public OxidationStateData.Entry addEntry(String structureID, String composition, IonFactory.Ion[] ions, String[] sources, double energyAboveHull, double gii)
  
  Add an entry to this data set
  
  Parameters:
  
  structureID - The ID for this entry. Entries do not need to have unique IDs in the case of monatomic / polyatomic compositions for the same structure, but if non-unique IDs are used in other contexts some functionality might not work as expected.
  
  composition - The composition for this entry.
  
  ions - The ions (including oxidation states) in this entry.
  
  sources - Where this entry came from. Multiple sources are allowed.
  
  energyAboveHull - The energy above the convex hull, in eV / atom. Double.NaN if unknown.
  
  gii - The global instability index for this entry. Double.NaN if unknown.
  
  Returns:
  
  the entry that was added.
- setEnergiesAboveHull
  
  public void setEnergiesAboveHull(Map<String,Double> energiesByID)
  
  Sets the energies above the hull for entries in the given map. The energies are set for all entries with the given ID, even if multiple entries share the same ID.
  
  Parameters:
  
  energiesByID - A map in which the key is an entry ID, the value is the energy above the hull in eV / atom, and the key is the entry ID. The energies are set for all entries with the given ID, even if multiple entries share the same ID.
- writeLikelihoods
  
  public void writeLikelihoods(LikelihoodCalculator calculator, String fileName)
  
  Writes the calculated likelihood scores, along with information about the composition and ions, for all entries in this data set to the given file.
  
  Parameters:
  
  calculator - The calculator used to calculate the likelihood score.
  
  fileName - The name of the file to be written.

Class OxidationStateData

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

OxidationStateData

OxidationStateData

OxidationStateData

Method Details

writeFile

writeFile

removeRandomEntries

copy

removeEntries

dataKeepOnlyIons

splitData

getStructDir

getCountsByIon

removeUncommonIonsByCount

printNumEntriesByIon

removeUncommonOxidationStates

getUniqueEntries

removeEntriesWithZeroOxidationStates

removeGIIDecrease

removeNonChargeBalancedStructures

removeEntriesWithZintlIons

getKnownOxidationStates

removeEntriesWithNonIntegerStates

removeEntriesWithNonIntegerStates

removeStructuresNotNearHull

removeUnstableEntries

getMinIntegerOxidationState

getMaxIntegerOxidationState

numEntries

getEntry

addEntry

setEnergiesAboveHull

writeLikelihoods