astrobase.lcproc.lcpfeatures module

This contains functions to generate periodic light curve features for later variable star classification.

astrobase.lcproc.lcpfeatures.get_periodicfeatures(pfpickle, lcbasedir, outdir, fourierorder=5, transitparams=(-0.01, 0.1, 0.1), ebparams=(-0.2, 0.3, 0.7, 0.5), pdiff_threshold=0.0001, sidereal_threshold=0.0001, sampling_peak_multiplier=5.0, sampling_startp=None, sampling_endp=None, starfeatures=None, timecols=None, magcols=None, errcols=None, lcformat='hat-sql', lcformatdir=None, sigclip=10.0, verbose=True, raiseonfail=False)[source]

This gets all periodic features for the object.

Parameters:
  • pfpickle (str) – The period-finding result pickle containing period-finder results to use for the calculation of LC fit, periodogram, and phased LC features.
  • lcbasedir (str) – The base directory where the light curve for the current object is located.
  • outdir (str) – The output directory where the results will be written.
  • fourierorder (int) – The Fourier order to use to generate sinusoidal function and fit that to the phased light curve.
  • transitparams (list of floats) – The transit depth, duration, and ingress duration to use to generate a trapezoid planet transit model fit to the phased light curve. The period used is the one provided in period, while the epoch is automatically obtained from a spline fit to the phased light curve.
  • ebparams (list of floats) – The primary eclipse depth, eclipse duration, the primary-secondary depth ratio, and the phase of the secondary eclipse to use to generate an eclipsing binary model fit to the phased light curve. The period used is the one provided in period, while the epoch is automatically obtained from a spline fit to the phased light curve.
  • pdiff_threshold (float) – This is the max difference between periods to consider them the same.
  • sidereal_threshold (float) – This is the max difference between any of the ‘best’ periods and the sidereal day periods to consider them the same.
  • sampling_peak_multiplier (float) – This is the minimum multiplicative factor of a ‘best’ period’s normalized periodogram peak over the sampling periodogram peak at the same period required to accept the ‘best’ period as possibly real.
  • sampling_endp (sampling_startp,) – If the pgramlist doesn’t have a time-sampling Lomb-Scargle periodogram, it will be obtained automatically. Use these kwargs to control the minimum and maximum period interval to be searched when generating this periodogram.
  • starfeatures (str or None) – If not None, this should be the filename of the starfeatures-<objectid>.pkl created by astrobase.lcproc.lcsfeatures.get_starfeatures() for this object. This is used to get the neighbor’s light curve and phase it with this object’s period to see if this object is blended.
  • timecols (list of str or None) – The timecol keys to use from the lcdict in calculating the features.
  • magcols (list of str or None) – The magcol keys to use from the lcdict in calculating the features.
  • errcols (list of str or None) – The errcol keys to use from the lcdict in calculating the features.
  • lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
  • lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
  • sigclip (float or int or sequence of two floats/ints or None) –

    If a single float or int, a symmetric sigma-clip will be performed using the number provided as the sigma-multiplier to cut out from the input time-series.

    If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigma-clip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10-sigma dimmings and greater than 3-sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.

    If sigclip is None, no sigma-clipping will be performed, and the time-series (with non-finite elems removed) will be passed through to the output.

  • verbose (bool) – If True, will indicate progress while working.
  • raiseonfail (bool) – If True, will raise an Exception if something goes wrong.
Returns:

Returns a filename for the output pickle containing all of the periodic features for the input object’s LC.

Return type:

str

astrobase.lcproc.lcpfeatures.serial_periodicfeatures(pfpkl_list, lcbasedir, outdir, starfeaturesdir=None, fourierorder=5, transitparams=(-0.01, 0.1, 0.1), ebparams=(-0.2, 0.3, 0.7, 0.5), pdiff_threshold=0.0001, sidereal_threshold=0.0001, sampling_peak_multiplier=5.0, sampling_startp=None, sampling_endp=None, starfeatures=None, timecols=None, magcols=None, errcols=None, lcformat='hat-sql', lcformatdir=None, sigclip=10.0, verbose=False, maxobjects=None)[source]

This drives the periodicfeatures collection for a list of periodfinding pickles.

Parameters:
  • pfpkl_list (list of str) – The list of period-finding pickles to use.
  • lcbasedir (str) – The base directory where the associated light curves are located.
  • outdir (str) – The directory where the results will be written.
  • starfeaturesdir (str or None) – The directory containing the starfeatures-<objectid>.pkl files for each object to use calculate neighbor proximity light curve features.
  • fourierorder (int) – The Fourier order to use to generate sinusoidal function and fit that to the phased light curve.
  • transitparams (list of floats) – The transit depth, duration, and ingress duration to use to generate a trapezoid planet transit model fit to the phased light curve. The period used is the one provided in period, while the epoch is automatically obtained from a spline fit to the phased light curve.
  • ebparams (list of floats) – The primary eclipse depth, eclipse duration, the primary-secondary depth ratio, and the phase of the secondary eclipse to use to generate an eclipsing binary model fit to the phased light curve. The period used is the one provided in period, while the epoch is automatically obtained from a spline fit to the phased light curve.
  • pdiff_threshold (float) – This is the max difference between periods to consider them the same.
  • sidereal_threshold (float) – This is the max difference between any of the ‘best’ periods and the sidereal day periods to consider them the same.
  • sampling_peak_multiplier (float) – This is the minimum multiplicative factor of a ‘best’ period’s normalized periodogram peak over the sampling periodogram peak at the same period required to accept the ‘best’ period as possibly real.
  • sampling_endp (sampling_startp,) – If the pgramlist doesn’t have a time-sampling Lomb-Scargle periodogram, it will be obtained automatically. Use these kwargs to control the minimum and maximum period interval to be searched when generating this periodogram.
  • timecols (list of str or None) – The timecol keys to use from the lcdict in calculating the features.
  • magcols (list of str or None) – The magcol keys to use from the lcdict in calculating the features.
  • errcols (list of str or None) – The errcol keys to use from the lcdict in calculating the features.
  • lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
  • lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
  • sigclip (float or int or sequence of two floats/ints or None) –

    If a single float or int, a symmetric sigma-clip will be performed using the number provided as the sigma-multiplier to cut out from the input time-series.

    If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigma-clip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10-sigma dimmings and greater than 3-sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.

    If sigclip is None, no sigma-clipping will be performed, and the time-series (with non-finite elems removed) will be passed through to the output.

  • verbose (bool) – If True, will indicate progress while working.
  • maxobjects (int) – The total number of objects to process from pfpkl_list.
Returns:

Return type:

Nothing.

astrobase.lcproc.lcpfeatures.parallel_periodicfeatures(pfpkl_list, lcbasedir, outdir, starfeaturesdir=None, fourierorder=5, transitparams=(-0.01, 0.1, 0.1), ebparams=(-0.2, 0.3, 0.7, 0.5), pdiff_threshold=0.0001, sidereal_threshold=0.0001, sampling_peak_multiplier=5.0, sampling_startp=None, sampling_endp=None, timecols=None, magcols=None, errcols=None, lcformat='hat-sql', lcformatdir=None, sigclip=10.0, verbose=False, maxobjects=None, nworkers=2)[source]

This runs periodic feature generation in parallel for all periodfinding pickles in the input list.

Parameters:
  • pfpkl_list (list of str) – The list of period-finding pickles to use.
  • lcbasedir (str) – The base directory where the associated light curves are located.
  • outdir (str) – The directory where the results will be written.
  • starfeaturesdir (str or None) – The directory containing the starfeatures-<objectid>.pkl files for each object to use calculate neighbor proximity light curve features.
  • fourierorder (int) – The Fourier order to use to generate sinusoidal function and fit that to the phased light curve.
  • transitparams (list of floats) – The transit depth, duration, and ingress duration to use to generate a trapezoid planet transit model fit to the phased light curve. The period used is the one provided in period, while the epoch is automatically obtained from a spline fit to the phased light curve.
  • ebparams (list of floats) – The primary eclipse depth, eclipse duration, the primary-secondary depth ratio, and the phase of the secondary eclipse to use to generate an eclipsing binary model fit to the phased light curve. The period used is the one provided in period, while the epoch is automatically obtained from a spline fit to the phased light curve.
  • pdiff_threshold (float) – This is the max difference between periods to consider them the same.
  • sidereal_threshold (float) – This is the max difference between any of the ‘best’ periods and the sidereal day periods to consider them the same.
  • sampling_peak_multiplier (float) – This is the minimum multiplicative factor of a ‘best’ period’s normalized periodogram peak over the sampling periodogram peak at the same period required to accept the ‘best’ period as possibly real.
  • sampling_endp (sampling_startp,) – If the pgramlist doesn’t have a time-sampling Lomb-Scargle periodogram, it will be obtained automatically. Use these kwargs to control the minimum and maximum period interval to be searched when generating this periodogram.
  • timecols (list of str or None) – The timecol keys to use from the lcdict in calculating the features.
  • magcols (list of str or None) – The magcol keys to use from the lcdict in calculating the features.
  • errcols (list of str or None) – The errcol keys to use from the lcdict in calculating the features.
  • lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
  • lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
  • sigclip (float or int or sequence of two floats/ints or None) –

    If a single float or int, a symmetric sigma-clip will be performed using the number provided as the sigma-multiplier to cut out from the input time-series.

    If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigma-clip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10-sigma dimmings and greater than 3-sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.

    If sigclip is None, no sigma-clipping will be performed, and the time-series (with non-finite elems removed) will be passed through to the output.

  • verbose (bool) – If True, will indicate progress while working.
  • maxobjects (int) – The total number of objects to process from pfpkl_list.
  • nworkers (int) – The number of parallel workers to launch to process the input.
Returns:

A dict containing key: val pairs of the input period-finder result and the output periodic feature result pickles for each input pickle is returned.

Return type:

dict

astrobase.lcproc.lcpfeatures.parallel_periodicfeatures_lcdir(pfpkl_dir, lcbasedir, outdir, pfpkl_glob='periodfinding-*.pkl*', starfeaturesdir=None, fourierorder=5, transitparams=(-0.01, 0.1, 0.1), ebparams=(-0.2, 0.3, 0.7, 0.5), pdiff_threshold=0.0001, sidereal_threshold=0.0001, sampling_peak_multiplier=5.0, sampling_startp=None, sampling_endp=None, timecols=None, magcols=None, errcols=None, lcformat='hat-sql', lcformatdir=None, sigclip=10.0, verbose=False, maxobjects=None, nworkers=2, recursive=True)[source]

This runs parallel periodicfeature extraction for a directory of periodfinding result pickles.

Parameters:
  • pfpkl_dir (str) – The directory containing the pickles to process.
  • lcbasedir (str) – The directory where all of the associated light curve files are located.
  • outdir (str) – The directory where all the output will be written.
  • pfpkl_glob (str) – The UNIX file glob to use to search for period-finder result pickles in pfpkl_dir.
  • starfeaturesdir (str or None) – The directory containing the starfeatures-<objectid>.pkl files for each object to use calculate neighbor proximity light curve features.
  • fourierorder (int) – The Fourier order to use to generate sinusoidal function and fit that to the phased light curve.
  • transitparams (list of floats) – The transit depth, duration, and ingress duration to use to generate a trapezoid planet transit model fit to the phased light curve. The period used is the one provided in period, while the epoch is automatically obtained from a spline fit to the phased light curve.
  • ebparams (list of floats) – The primary eclipse depth, eclipse duration, the primary-secondary depth ratio, and the phase of the secondary eclipse to use to generate an eclipsing binary model fit to the phased light curve. The period used is the one provided in period, while the epoch is automatically obtained from a spline fit to the phased light curve.
  • pdiff_threshold (float) – This is the max difference between periods to consider them the same.
  • sidereal_threshold (float) – This is the max difference between any of the ‘best’ periods and the sidereal day periods to consider them the same.
  • sampling_peak_multiplier (float) – This is the minimum multiplicative factor of a ‘best’ period’s normalized periodogram peak over the sampling periodogram peak at the same period required to accept the ‘best’ period as possibly real.
  • sampling_endp (sampling_startp,) – If the pgramlist doesn’t have a time-sampling Lomb-Scargle periodogram, it will be obtained automatically. Use these kwargs to control the minimum and maximum period interval to be searched when generating this periodogram.
  • timecols (list of str or None) – The timecol keys to use from the lcdict in calculating the features.
  • magcols (list of str or None) – The magcol keys to use from the lcdict in calculating the features.
  • errcols (list of str or None) – The errcol keys to use from the lcdict in calculating the features.
  • lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
  • lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
  • sigclip (float or int or sequence of two floats/ints or None) –

    If a single float or int, a symmetric sigma-clip will be performed using the number provided as the sigma-multiplier to cut out from the input time-series.

    If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigma-clip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10-sigma dimmings and greater than 3-sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.

    If sigclip is None, no sigma-clipping will be performed, and the time-series (with non-finite elems removed) will be passed through to the output.

  • verbose (bool) – If True, will indicate progress while working.
  • maxobjects (int) – The total number of objects to process from pfpkl_list.
  • nworkers (int) – The number of parallel workers to launch to process the input.
Returns:

A dict containing key: val pairs of the input period-finder result and the output periodic feature result pickles for each input pickle is returned.

Return type:

dict