astrobase.lcproc.tfa module

This contains functions to run the Trend Filtering Algorithm (TFA) in a parallelized manner on large collections of light curves.

astrobase.lcproc.tfa.tfa_templates_lclist(lclist, outfile, lcinfo_pkl=None, target_template_frac=0.1, max_target_frac_obs=0.25, min_template_number=10, max_template_number=1000, max_rms=0.15, max_mult_above_magmad=1.5, max_mult_above_mageta=1.5, mag_bandpass='sdssr', custom_bandpasses=None, mag_bright_limit=10.0, mag_faint_limit=12.0, process_template_lcs=True, template_sigclip=5.0, template_interpolate='nearest', lcformat='hat-sql', lcformatdir=None, timecols=None, magcols=None, errcols=None, nworkers=2, maxworkertasks=1000)[source]

This selects template objects for TFA.

Selection criteria for TFA template ensemble objects:

  • not variable: use a poly fit to the mag-MAD relation and eta-normal variability index to get nonvar objects
  • not more than 10% of the total number of objects in the field or max_tfa_templates at most and no more than max_target_frac_obs x template_ndet objects.
  • allow shuffling of the templates if the target ends up in them
  • nothing with less than the median number of observations in the field
  • sigma-clip the input time series observations
  • TODO: select randomly in xi-eta space. This doesn’t seem to make a huge difference at the moment, so removed those bits for now. This function makes plots of xi-eta for the selected template objects so the distributions can be visualized.

This also determines the effective cadence that all TFA LCs will be binned to as the template LC with the largest number of non-nan observations will be used. All template LCs will be renormed to zero.

Parameters:
  • lclist (list of str) – This is a list of light curves to use as input to generate the template set.
  • outfile (str) – This is the pickle filename to which the TFA template list will be written to.
  • lcinfo_pkl (str or None) – If provided, is a file path to a pickle file created by this function on a previous run containing the LC information. This will be loaded directly instead of having to re-run LC info collection. If None, will be placed in the same directory as outfile.
  • target_template_frac (float) – This is the fraction of total objects in lclist to use for the number of templates.
  • max_target_frac_obs (float) – This sets the number of templates to generate if the number of observations for the light curves is smaller than the number of objects in the collection. The number of templates will be set to this fraction of the number of observations if this is the case.
  • min_template_number (int) – This is the minimum number of templates to generate.
  • max_template_number (int) – This is the maximum number of templates to generate. If target_template_frac times the number of objects is greater than max_template_number, only max_template_number templates will be used.
  • max_rms (float) – This is the maximum light curve RMS for an object to consider it as a possible template ensemble member.
  • max_mult_above_magmad (float) – This is the maximum multiplier above the mag-RMS fit to consider an object as variable and thus not part of the template ensemble.
  • max_mult_above_mageta (float) – This is the maximum multiplier above the mag-eta (variable index) fit to consider an object as variable and thus not part of the template ensemble.
  • mag_bandpass (str) – This sets the key in the light curve dict’s objectinfo dict to use as the canonical magnitude for the object and apply any magnitude limits to.
  • custom_bandpasses (dict or None) – This can be used to provide any custom band name keys to the star feature collection function.
  • mag_bright_limit (float or list of floats) – This sets the brightest mag (in the mag_bandpass filter) for a potential member of the TFA template ensemble. If this is a single float, the value will be used for all magcols. If this is a list of floats with len = len(magcols), the specific bright limits will be used for each magcol individually.
  • mag_faint_limit (float or list of floats) – This sets the faintest mag (in the mag_bandpass filter) for a potential member of the TFA template ensemble. If this is a single float, the value will be used for all magcols. If this is a list of floats with len = len(magcols), the specific faint limits will be used for each magcol individually.
  • process_template_lcs (bool) – If True, will reform the template light curves to the chosen time-base. If False, will only select light curves for templates but not process them. This is useful for initial exploration of how the template LC are selected.
  • template_sigclip (float or sequence of floats or None) – This sets the sigma-clip to be applied to the template light curves.
  • template_interpolate (str) – This sets the kwarg to pass to scipy.interpolate.interp1d to set the kind of interpolation to use when reforming light curves to the TFA template timebase.
  • lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
  • lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
  • timecols (list of str or None) – The timecol keys to use from the lcdict in calculating the features.
  • magcols (list of str or None) – The magcol keys to use from the lcdict in calculating the features.
  • errcols (list of str or None) – The errcol keys to use from the lcdict in calculating the features.
  • nworkers (int) – The number of parallel workers to launch.
  • maxworkertasks (int) – The maximum number of tasks to run per worker before it is replaced by a fresh one.
Returns:

This function returns a dict that can be passed directly to apply_tfa_magseries below. It can optionally produce a pickle with the same dict, which can also be passed to that function.

Return type:

dict

astrobase.lcproc.tfa.apply_tfa_magseries(lcfile, timecol, magcol, errcol, templateinfo, mintemplatedist_arcmin=10.0, lcformat='hat-sql', lcformatdir=None, interp='nearest', sigclip=5.0)[source]

This applies the TFA correction to an LC given TFA template information.

Parameters:
  • lcfile (str) – This is the light curve file to apply the TFA correction to.
  • timecol,magcol,errcol (str) – These are the column keys in the lcdict for the LC file to apply the TFA correction to.
  • templateinfo (dict or str) – This is either the dict produced by tfa_templates_lclist or the pickle produced by the same function.
  • mintemplatedist_arcmin (float) – This sets the minimum distance required from the target object for objects in the TFA template ensemble. Objects closer than this distance will be removed from the ensemble.
  • lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
  • lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
  • interp (str) – This is passed to scipy.interpolate.interp1d as the kind of interpolation to use when reforming this light curve to the timebase of the TFA templates.
  • sigclip (float or sequence of two floats or None) – This is the sigma clip to apply to this light curve before running TFA on it.
Returns:

This returns the filename of the light curve file generated after TFA applications. This is a pickle (that can be read by lcproc.read_pklc) in the same directory as lcfile. The magcol will be encoded in the filename, so each magcol in lcfile gets its own output file.

Return type:

str

astrobase.lcproc.tfa.parallel_tfa_lclist(lclist, templateinfo, timecols=None, magcols=None, errcols=None, lcformat='hat-sql', lcformatdir=None, interp='nearest', sigclip=5.0, mintemplatedist_arcmin=10.0, nworkers=2, maxworkertasks=1000)[source]

This applies TFA in parallel to all LCs in the given list of file names.

Parameters:
  • lclist (str) – This is a list of light curve files to apply TFA correction to.
  • templateinfo (dict or str) – This is either the dict produced by tfa_templates_lclist or the pickle produced by the same function.
  • timecols (list of str or None) – The timecol keys to use from the lcdict in applying TFA corrections.
  • magcols (list of str or None) – The magcol keys to use from the lcdict in applying TFA corrections.
  • errcols (list of str or None) – The errcol keys to use from the lcdict in applying TFA corrections.
  • lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
  • lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
  • interp (str) – This is passed to scipy.interpolate.interp1d as the kind of interpolation to use when reforming the light curves to the timebase of the TFA templates.
  • sigclip (float or sequence of two floats or None) – This is the sigma clip to apply to the light curves before running TFA on it.
  • mintemplatedist_arcmin (float) – This sets the minimum distance required from the target object for objects in the TFA template ensemble. Objects closer than this distance will be removed from the ensemble.
  • nworkers (int) – The number of parallel workers to launch
  • maxworkertasks (int) – The maximum number of tasks per worker allowed before it’s replaced by a fresh one.
Returns:

Contains the input file names and output TFA light curve filenames per input file organized by each magcol in magcols.

Return type:

dict

astrobase.lcproc.tfa.parallel_tfa_lcdir(lcdir, templateinfo, lcfileglob=None, timecols=None, magcols=None, errcols=None, lcformat='hat-sql', lcformatdir=None, interp='nearest', sigclip=5.0, mintemplatedist_arcmin=10.0, nworkers=2, maxworkertasks=1000)[source]

This applies TFA in parallel to all LCs in a directory.

Parameters:
  • lcdir (str) – This is the directory containing the light curve files to process..
  • templateinfo (dict or str) – This is either the dict produced by tfa_templates_lclist or the pickle produced by the same function.
  • lcfileglob (str or None) – The UNIX file glob to use when searching for light curve files in lcdir. If None, the default file glob associated with registered LC format provided is used.
  • timecols (list of str or None) – The timecol keys to use from the lcdict in applying TFA corrections.
  • magcols (list of str or None) – The magcol keys to use from the lcdict in applying TFA corrections.
  • errcols (list of str or None) – The errcol keys to use from the lcdict in applying TFA corrections.
  • lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
  • lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
  • interp (str) – This is passed to scipy.interpolate.interp1d as the kind of interpolation to use when reforming the light curves to the timebase of the TFA templates.
  • sigclip (float or sequence of two floats or None) – This is the sigma clip to apply to the light curves before running TFA on it.
  • mintemplatedist_arcmin (float) – This sets the minimum distance required from the target object for objects in the TFA template ensemble. Objects closer than this distance will be removed from the ensemble.
  • nworkers (int) – The number of parallel workers to launch
  • maxworkertasks (int) – The maximum number of tasks per worker allowed before it’s replaced by a fresh one.
Returns:

Contains the input file names and output TFA light curve filenames per input file organized by each magcol in magcols.

Return type:

dict