Astrobase¶
Astrobase is a Python package for analyzing light curves and finding variable stars. It includes implementations of several periodfinding algorithms, batch work drivers for working on large collections of light curves, and a webapp useful for reviewing and classifying light curves by stellar variability type. This package was spun out of a bunch of Python modules I wrote and maintain for my work with the HAT Exoplanet Surveys. It’s applicable to many other astronomical timeseries observations, and includes support for the light curves produced by Kepler and TESS in particular.
Most functions in this package that deal with light curves (e.g. in the modules
astrobase.lcfit
, astrobase.lcmath
,
astrobase.periodbase
, astrobase.plotbase
,
astrobase.checkplot
) usually require three Numpy ndarrays as input:
times, mags, and errs, so they should work with any timeseries data that
can be represented in this form. If you have flux time series measurements, most
functions also take a magsarefluxes keyword argument that makes them handle
flux light curves correctly.
The astrobase.lcproc
subpackage implements drivers for working on
large collections of light curve files, and includes functions to register your
own light curve format so that it gets recognized and can be worked on by other
Astrobase functions transparently.
 Guides for specific tasks are available as Jupyter notebooks at Github: astrobasenotebooks.
 The full API documentation generated automatically from the docstrings by Sphinx is available.
 The code for Astrobase is maintained at Github.
Install Astrobase from PyPI using pip:
# preferably in a virtualenv
# install Numpy to compile Fortran dependencies
$ pip install numpy
# install astrobase
$ pip install astrobase
Package contents¶
astrobase.astrokep module¶
Contains various useful tools for analyzing Kepler light curves.

astrobase.astrokep.
keplerflux_to_keplermag
(keplerflux, f12=174000.0)[source]¶ This converts the Kepler flux in electrons/sec to Kepler magnitude.
The kepler mag/flux relation is:
fkep = (10.0**(0.4*(kepmag  12.0)))*f12 f12 = 1.74e5 # electrons/sec
Parameters:  keplerflux (float or arraylike) – The flux value(s) to convert to magnitudes.
 f12 (float) – The flux value in the Kepler band corresponding to Kepler mag = 12.0.
Returns: Magnitudes in the Kepler band corresponding to the input keplerflux flux value(s).
Return type: np.array

astrobase.astrokep.
keplermag_to_keplerflux
(keplermag, f12=174000.0)[source]¶ This converts the Kepler mag back to Kepler flux.
Parameters:  keplermag (float or arraylike) – The Kepler magnitude value(s) to convert to fluxes.
 f12 (float) – The flux value in the Kepler band corresponding to Kepler mag = 12.0.
Returns: Fluxes in the Kepler band corresponding to the input keplermag magnitude value(s).
Return type: np.array

astrobase.astrokep.
keplermag_to_sdssr
(keplermag, kic_sdssg, kic_sdssr)[source]¶ Converts magnitude measurements in Kepler band to SDSS r band.
Parameters:  keplermag (float or arraylike) – The Kepler magnitude value(s) to convert to fluxes.
 kic_sdssg,kic_sdssr (float or arraylike) – The SDSS g and r magnitudes of the object(s) from the Kepler Input Catalog. The .llc.fits MAST light curve file for a Kepler object contains these values in the FITS extension 0 header.
Returns: SDSS r band magnitude(s) converted from the Kepler band magnitude.
Return type: float or arraylike

astrobase.astrokep.
flux_ppm_to_magnitudes
(ppm)[source]¶ This converts Kepler’s flux partspermillion to magnitudes.
Mostly useful for turning PPMs reported by Kepler or TESS into millimag values to compare with groundbased surveys.
Parameters: ppm (float or arraylike) – Kepler flux measurement errors or RMS values in partspermillion. Returns: Measurement errors or RMS values expressed in magnitudes. Return type: float or arraylike

astrobase.astrokep.
read_kepler_fitslc
(lcfits, headerkeys=['TIMESYS', 'BJDREFI', 'BJDREFF', 'OBJECT', 'KEPLERID', 'RA_OBJ', 'DEC_OBJ', 'EQUINOX', 'EXPOSURE', 'CDPP3_0', 'CDPP6_0', 'CDPP12_0', 'PDCVAR', 'PDCMETHD', 'CROWDSAP', 'FLFRCSAP'], datakeys=['TIME', 'TIMECORR', 'CADENCENO', 'SAP_QUALITY', 'PSF_CENTR1', 'PSF_CENTR1_ERR', 'PSF_CENTR2', 'PSF_CENTR2_ERR', 'MOM_CENTR1', 'MOM_CENTR1_ERR', 'MOM_CENTR2', 'MOM_CENTR2_ERR'], sapkeys=['SAP_FLUX', 'SAP_FLUX_ERR', 'SAP_BKG', 'SAP_BKG_ERR'], pdckeys=['PDCSAP_FLUX', 'PDCSAP_FLUX_ERR'], topkeys=['CHANNEL', 'SKYGROUP', 'MODULE', 'OUTPUT', 'QUARTER', 'SEASON', 'CAMPAIGN', 'DATA_REL', 'OBSMODE', 'PMRA', 'PMDEC', 'PMTOTAL', 'PARALLAX', 'GLON', 'GLAT', 'GMAG', 'RMAG', 'IMAG', 'ZMAG', 'D51MAG', 'JMAG', 'HMAG', 'KMAG', 'KEPMAG', 'GRCOLOR', 'JKCOLOR', 'GKCOLOR', 'TEFF', 'LOGG', 'FEH', 'EBMINUSV', 'AV', 'RADIUS', 'TMINDEX'], apkeys=['NPIXSAP', 'NPIXMISS', 'CDELT1', 'CDELT2'], appendto=None, normalize=False)[source]¶ This extracts the light curve from a single Kepler or K2 LC FITS file.
This works on the light curves available at MAST:
 kplr{kepid}{somedatething}_llc.fits files from the Kepler mission
 ktwo{epicid}c{campaign}_llc.fits files from the K2 mission
Parameters:  lcfits (str) – The filename of a MAST Kepler/K2 light curve FITS file.
 headerkeys (list) – A list of FITS header keys that will be extracted from the FITS light curve file. These describe the observations. The default value for this is given in LCHEADERKEYS above.
 datakeys (list) – A list of FITS column names that correspond to the auxiliary measurements in the light curve. The default is LCDATAKEYS above.
 sapkeys (list) – A list of FITS column names that correspond to the SAP flux measurements in the light curve. The default is LCSAPKEYS above.
 pdckeys (list) – A list of FITS column names that correspond to the PDC flux measurements in the light curve. The default is LCPDCKEYS above.
 topkeys (list) – A list of FITS header keys that describe the object in the light curve. The default is LCTOPKEYS above.
 apkeys (list) – A list of FITS header keys that describe the flux measurement apertures used by the Kepler/K2 pipeline. The default is LCAPERTUREKEYS above.
 appendto (lcdict or None) – If appendto is an lcdict, will append measurements of this lcdict to that lcdict. This is used for consolidating light curves for the same object across different files (quarters). The appending does not care about the time order. To consolidate light curves in time order, use consolidate_kepler_fitslc below.
 normalize (bool) – If True, then each component light curve’s SAP_FLUX and PDCSAP_FLUX measurements will be normalized to 1.0 by dividing out the median flux for the component light curve.
Returns: Returns an lcdict (this is useable by most astrobase functions for LC processing).
Return type: lcdict

astrobase.astrokep.
consolidate_kepler_fitslc
(keplerid, lcfitsdir, normalize=True, headerkeys=['TIMESYS', 'BJDREFI', 'BJDREFF', 'OBJECT', 'KEPLERID', 'RA_OBJ', 'DEC_OBJ', 'EQUINOX', 'EXPOSURE', 'CDPP3_0', 'CDPP6_0', 'CDPP12_0', 'PDCVAR', 'PDCMETHD', 'CROWDSAP', 'FLFRCSAP'], datakeys=['TIME', 'TIMECORR', 'CADENCENO', 'SAP_QUALITY', 'PSF_CENTR1', 'PSF_CENTR1_ERR', 'PSF_CENTR2', 'PSF_CENTR2_ERR', 'MOM_CENTR1', 'MOM_CENTR1_ERR', 'MOM_CENTR2', 'MOM_CENTR2_ERR'], sapkeys=['SAP_FLUX', 'SAP_FLUX_ERR', 'SAP_BKG', 'SAP_BKG_ERR'], pdckeys=['PDCSAP_FLUX', 'PDCSAP_FLUX_ERR'], topkeys=['CHANNEL', 'SKYGROUP', 'MODULE', 'OUTPUT', 'QUARTER', 'SEASON', 'CAMPAIGN', 'DATA_REL', 'OBSMODE', 'PMRA', 'PMDEC', 'PMTOTAL', 'PARALLAX', 'GLON', 'GLAT', 'GMAG', 'RMAG', 'IMAG', 'ZMAG', 'D51MAG', 'JMAG', 'HMAG', 'KMAG', 'KEPMAG', 'GRCOLOR', 'JKCOLOR', 'GKCOLOR', 'TEFF', 'LOGG', 'FEH', 'EBMINUSV', 'AV', 'RADIUS', 'TMINDEX'], apkeys=['NPIXSAP', 'NPIXMISS', 'CDELT1', 'CDELT2'])[source]¶ This gets all Kepler/K2 light curves for the given keplerid in lcfitsdir.
Searches recursively in lcfitsdir for all of the files belonging to the specified keplerid. Sorts the light curves by time. Returns an lcdict. This is meant to be used to consolidate light curves for a single object across Kepler quarters.
NOTE: keplerid is an integer (without the leading zeros). This is usually the KIC ID.
NOTE: if light curve time arrays contain nans, these and their associated measurements will be sorted to the end of the final combined arrays.
Parameters:  keplerid (int) – The Kepler ID of the object to consolidate LCs for, as an integer without any leading zeros. This is usually the KIC or EPIC ID.
 lcfitsdir (str) – The directory to look in for LCs of the specified object.
 normalize (bool) – If True, then each component light curve’s SAP_FLUX and PDCSAP_FLUX measurements will be normalized to 1.0 by dividing out the median flux for the component light curve.
 headerkeys (list) – A list of FITS header keys that will be extracted from the FITS light curve file. These describe the observations. The default value for this is given in LCHEADERKEYS above.
 datakeys (list) – A list of FITS column names that correspond to the auxiliary measurements in the light curve. The default is LCDATAKEYS above.
 sapkeys (list) – A list of FITS column names that correspond to the SAP flux measurements in the light curve. The default is LCSAPKEYS above.
 pdckeys (list) – A list of FITS column names that correspond to the PDC flux measurements in the light curve. The default is LCPDCKEYS above.
 topkeys (list) – A list of FITS header keys that describe the object in the light curve. The default is LCTOPKEYS above.
 apkeys (list) – A list of FITS header keys that describe the flux measurement apertures used by the Kepler/K2 pipeline. The default is LCAPERTUREKEYS above.
Returns: Returns an lcdict (this is useable by most astrobase functions for LC processing).
Return type: lcdict

astrobase.astrokep.
read_k2sff_lightcurve
(lcfits)[source]¶ This reads a K2 SFF (Vandenberg+ 2014) light curve into an lcdict.
Use this with the light curves from the K2 SFF project at MAST.
Parameters: lcfits (str) – The filename of the FITS light curve file downloaded from MAST. Returns: Returns an lcdict (this is useable by most astrobase functions for LC processing). Return type: lcdict

astrobase.astrokep.
kepler_lcdict_to_pkl
(lcdict, outfile=None)[source]¶ This writes the lcdict to a Python pickle.
Parameters:  lcdict (lcdict) – This is the input lcdict to write to a pickle.
 outfile (str or None) – If this is None, the object’s Kepler ID/EPIC ID will determined from the lcdict and used to form the filename of the output pickle file. If this is a str, the provided filename will be used.
Returns: The absolute path to the written pickle file.
Return type: str

astrobase.astrokep.
read_kepler_pklc
(picklefile)[source]¶ This turns the pickled lightcurve file back into an lcdict.
Parameters: picklefile (str) – The path to a previously written Kepler LC picklefile generated by kepler_lcdict_to_pkl above. Returns: Returns an lcdict (this is useable by most astrobase functions for LC processing). Return type: lcdict

astrobase.astrokep.
stitch_kepler_lcdict
(lcdict)[source]¶ This stitches Kepler light curves together across quarters.
FIXME: implement this.
Parameters: lcdict (lcdict) – An lcdict produced by consolidate_kepler_fitslc. The flux measurements between quarters will be stitched together. Returns: Returns an lcdict (this is useable by most astrobase functions for LC processing). The flux measurements will have been shifted to form a seamless light curve across quarters suitable for longterm variability investigation. Return type: lcdict

astrobase.astrokep.
filter_kepler_lcdict
(lcdict, filterflags=True, nanfilter='sap, pdc', timestoignore=None)[source]¶ This filters the Kepler lcdict, removing nans and bad observations.
By default, this function removes points in the Kepler LC that have ANY quality flags set.
Parameters:  lcdict (lcdict) – An lcdict produced by consolidate_kepler_fitslc or read_kepler_fitslc.
 filterflags (bool) – If True, will remove any measurements that have nonzero quality flags present. This usually indicates an issue with the instrument or spacecraft.
 nanfilter ({'sap','pdc','sap,pdc'}) – Indicates the flux measurement type(s) to apply the filtering to.
 timestoignore (list of tuples or None) –
This is of the form:
[(time1_start, time1_end), (time2_start, time2_end), ...]
and indicates the start and end times to mask out of the final lcdict. Use this to remove anything that wasn’t caught by the quality flags.
Returns: Returns an lcdict (this is useable by most astrobase functions for LC processing). The lcdict is filtered IN PLACE!
Return type: lcdict

astrobase.astrokep.
epd_kepler_lightcurve
(lcdict, xccol='mom_centr1', yccol='mom_centr2', timestoignore=None, filterflags=True, writetodict=True, epdsmooth=5)[source]¶ This runs EPD on the Kepler light curve.
Following Huang et al. 2015, we fit the following EPD function to a smoothed light curve, and then subtract it to obtain EPD corrected magnitudes:
f = c0 + c1*sin(2*pi*x) + c2*cos(2*pi*x) + c3*sin(2*pi*y) + c4*cos(2*pi*y) + c5*sin(4*pi*x) + c6*cos(4*pi*x) + c7*sin(4*pi*y) + c8*cos(4*pi*y) + c9*bgv + c10*bge
By default, this function removes points in the Kepler LC that have ANY quality flags set.
Parameters:  lcdict (lcdict) – An lcdict produced by consolidate_kepler_fitslc or read_kepler_fitslc.
 xcol,ycol (str) – Indicates the x and y coordinate column names to use from the Kepler LC in the EPD fit.
 timestoignore (list of tuples) –
This is of the form:
[(time1_start, time1_end), (time2_start, time2_end), ...]
and indicates the start and end times to mask out of the final lcdict. Use this to remove anything that wasn’t caught by the quality flags.
 filterflags (bool) – If True, will remove any measurements that have nonzero quality flags present. This usually indicates an issue with the instrument or spacecraft.
 writetodict (bool) –
If writetodict is True, adds the following columns to the lcdict:
epd_time = time array epd_sapflux = uncorrected flux before EPD epd_epdsapflux = corrected flux after EPD epd_epdsapcorr = EPD flux corrections epd_bkg = background array epd_bkg_err = background errors array epd_xcc = xcoord array epd_ycc = ycoord array epd_quality = quality flag array
and updates the ‘columns’ list in the lcdict as well.
 epdsmooth (int) – Sets the number of light curve points to smooth over when generating the EPD fit function.
Returns: Returns a tuple of the form: (times, epdfluxes, fitcoeffs, epdfit)
Return type: tuple

astrobase.astrokep.
rfepd_kepler_lightcurve
(lcdict, xccol='mom_centr1', yccol='mom_centr2', timestoignore=None, filterflags=True, writetodict=True, epdsmooth=23, decorr='xcc, ycc', nrftrees=200)[source]¶ This uses a RandomForestRegressor to fit and decorrelate Kepler light curves.
Fits the X and Y positions, the background, and background error.
By default, this function removes points in the Kepler LC that have ANY quality flags set.
Parameters:  lcdict (lcdict) – An lcdict produced by consolidate_kepler_fitslc or read_kepler_fitslc.
 xcol,ycol (str) – Indicates the x and y coordinate column names to use from the Kepler LC in the EPD fit.
 timestoignore (list of tuples) –
This is of the form:
[(time1_start, time1_end), (time2_start, time2_end), ...]
and indicates the start and end times to mask out of the final lcdict. Use this to remove anything that wasn’t caught by the quality flags.
 filterflags (bool) – If True, will remove any measurements that have nonzero quality flags present. This usually indicates an issue with the instrument or spacecraft.
 writetodict (bool) –
If writetodict is True, adds the following columns to the lcdict:
rfepd_time = time array rfepd_sapflux = uncorrected flux before EPD rfepd_epdsapflux = corrected flux after EPD rfepd_epdsapcorr = EPD flux corrections rfepd_bkg = background array rfepd_bkg_err = background errors array rfepd_xcc = xcoord array rfepd_ycc = ycoord array rfepd_quality = quality flag array
and updates the ‘columns’ list in the lcdict as well.
 epdsmooth (int) – Sets the number of light curve points to smooth over when generating the EPD fit function.
 decorr ({'xcc,ycc','bgv,bge','xcc,ycc,bgv,bge'}) – Indicates whether to use the x,y coords alone; background value and error alone; or x,y coords and background value, error in combination as the features to training the RandomForestRegressor on and perform the fit.
 nrftrees (int) – The number of trees to use in the RandomForestRegressor.
Returns: Returns a tuple of the form: (times, corrected_fluxes, flux_corrections)
Return type: tuple

astrobase.astrokep.
detrend_centroid
(lcd, detrend='legendre', sigclip=None, mingap=0.5)[source]¶ Detrends the x and y coordinate centroids for a Kepler light curve.
Given an lcdict for a single quarter of Kepler data, returned by read_kepler_fitslc, this function returns this same dictionary, appending detrended centroid_x and centroid_y values.
Here “detrended” means “finite, SAP quality flag set to 0, sigma clipped, timegroups selected based on mingap day gaps, then fit vs time by a legendre polynomial of lowish degree”.
Parameters:  lcd (lcdict) – An lcdict generated by the read_kepler_fitslc function.
 detrend ({'legendre'}) – Method by which to detrend the LC. ‘legendre’ is the only thing implemented at the moment.
 sigclip (None or float or int or sequence of floats/ints) – Determines the type and amount of sigmaclipping done on the light curve to remove outliers. If None, no sigmaclipping is performed. If a two element sequence of floats/ints, the first element corresponds to the fainter sigmaclip limit, and the second element corresponds to the brighter sigmaclip limit.
 mingap (float) – Number of days by which to define “timegroups” (for individual fitting each of timegroup, and to eliminate “burnin” of Kepler spacecraft. For long cadence data, 0.5 days is typical.
Returns: This is of the form (lcd, errflag), where:
lcd : an lcdict with the new key lcd[‘centroids’], containing the detrended times, (centroid_x, centroid_y) values, and their errors.
errflag : boolean error flag, could be raised at various points.
Return type: tuple

astrobase.astrokep.
get_centroid_offsets
(lcd, t_ing_egr, oot_buffer_time=0.1, sample_factor=3)[source]¶ After running detrend_centroid, this gets positions of centroids during transits, and outside of transits.
These positions can then be used in a false positive analysis.
This routine requires knowing the ingress and egress times for every transit of interest within the quarter this routine is being called for. There is currently no astrobase routine that automates this for periodic transits (it must be done in a calling routine).
To get out of transit centroids, this routine takes points outside of the “buffer” set by oot_buffer_time, sampling 3x as many points on either side of the transit as are in the transit (or however many are specified by sample_factor).
Parameters:  lcd (lcdict) – An lcdict generated by the read_kepler_fitslc function. We assume that the detrend_centroid function has been run on this lcdict.
 t_ing_egr (list of tuples) –
This is of the form:
[(ingress time of i^th transit, egress time of i^th transit)]
for i the transit number index in this quarter (starts at zero at the beginning of every quarter). Assumes units of BJD.
 oot_buffer_time (float) – Number of days away from ingress and egress times to begin sampling “out of transit” centroid points. The number of out of transit points to take per transit is 3x the number of points in transit.
 sample_factor (float) – The size of out of transit window from which to sample.
Returns: This is a dictionary keyed by transit number (i.e., the same index as t_ing_egr), where each key contains the following value:
{'ctd_x_in_tra':ctd_x_in_tra, 'ctd_y_in_tra':ctd_y_in_tra, 'ctd_x_oot':ctd_x_oot, 'ctd_y_oot':ctd_y_oot, 'npts_in_tra':len(ctd_x_in_tra), 'npts_oot':len(ctd_x_oot), 'in_tra_times':in_tra_times, 'oot_times':oot_times}
Return type: dict
astrobase.astrotess module¶
Contains various tools for analyzing TESS light curves.

astrobase.astrotess.
normalized_flux_to_mag
(lcdict, columns=('sap.sap_flux', 'sap.sap_flux_err', 'sap.sap_bkg', 'sap.sap_bkg_err', 'pdc.pdcsap_flux', 'pdc.pdcsap_flux_err'))[source]¶ This converts the normalized fluxes in the TESS lcdicts to TESS mags.
Uses the object’s TESS mag stored in lcdict[‘objectinfo’][‘tessmag’]:
mag  object_tess_mag = 2.5 log (flux/median_flux)
Parameters:  lcdict (lcdict) – An lcdict produced by read_tess_fitslc or consolidate_tess_fitslc. This must have normalized fluxes in its measurement columns (use the normalize kwarg for these functions).
 columns (sequence of str) – The column keys of the normalized flux and background measurements in the lcdict to operate on and convert to magnitudes in TESS band (T).
Returns: The returned lcdict will contain extra columns corresponding to magnitudes for each input normalized flux/background column.
Return type: lcdict

astrobase.astrotess.
read_tess_fitslc
(lcfits, headerkeys=['EXPOSURE', 'TIMEREF', 'TASSIGN', 'TIMESYS', 'BJDREFI', 'BJDREFF', 'TELAPSE', 'LIVETIME', 'INT_TIME', 'NUM_FRM', 'TIMEDEL', 'BACKAPP', 'DEADAPP', 'VIGNAPP', 'GAINA', 'GAINB', 'GAINC', 'GAIND', 'READNOIA', 'READNOIB', 'READNOIC', 'READNOID', 'CDPP0_5', 'CDPP1_0', 'CDPP2_0', 'PDCVAR', 'PDCMETHD', 'CROWDSAP', 'FLFRCSAP', 'NSPSDDET', 'NSPSDCOR'], datakeys=['TIME', 'TIMECORR', 'CADENCENO', 'QUALITY', 'PSF_CENTR1', 'PSF_CENTR1_ERR', 'PSF_CENTR2', 'PSF_CENTR2_ERR', 'MOM_CENTR1', 'MOM_CENTR1_ERR', 'MOM_CENTR2', 'MOM_CENTR2_ERR', 'POS_CORR1', 'POS_CORR2'], sapkeys=['SAP_FLUX', 'SAP_FLUX_ERR', 'SAP_BKG', 'SAP_BKG_ERR'], pdckeys=['PDCSAP_FLUX', 'PDCSAP_FLUX_ERR'], topkeys=['DATEOBS', 'DATEEND', 'PROCVER', 'ORIGIN', 'DATA_REL', 'TIMVERSN', 'OBJECT', 'TICID', 'SECTOR', 'CAMERA', 'CCD', 'PXTABLE', 'RADESYS', 'RA_OBJ', 'DEC_OBJ', 'EQUINOX', 'PMRA', 'PMDEC', 'PMTOTAL', 'TESSMAG', 'TEFF', 'LOGG', 'MH', 'RADIUS', 'TICVER', 'CRMITEN', 'CRBLKSZ', 'CRSPOC'], apkeys=['NPIXSAP', 'NPIXMISS', 'CDELT1', 'CDELT2'], normalize=False, appendto=None, filterqualityflags=False, nanfilter=None, timestoignore=None)[source]¶ This extracts the light curve from a single TESS .lc.fits file.
This works on the light curves available at MAST.
TODO: look at:
https://archive.stsci.edu/missions/tess/doc/EXPTESSARCICDTM0014.pdf
for details on the column descriptions and to fill in any other info we need.
Parameters:  lcfits (str) – The filename of a MAST Kepler/K2 light curve FITS file.
 headerkeys (list) – A list of FITS header keys that will be extracted from the FITS light curve file. These describe the observations. The default value for this is given in LCHEADERKEYS above.
 datakeys (list) – A list of FITS column names that correspond to the auxiliary measurements in the light curve. The default is LCDATAKEYS above.
 sapkeys (list) – A list of FITS column names that correspond to the SAP flux measurements in the light curve. The default is LCSAPKEYS above.
 pdckeys (list) – A list of FITS column names that correspond to the PDC flux measurements in the light curve. The default is LCPDCKEYS above.
 topkeys (list) – A list of FITS header keys that describe the object in the light curve. The default is LCTOPKEYS above.
 apkeys (list) – A list of FITS header keys that describe the flux measurement apertures used by the TESS pipeline. The default is LCAPERTUREKEYS above.
 normalize (bool) – If True, then the light curve’s SAP_FLUX and PDCSAP_FLUX measurements will be normalized to 1.0 by dividing out the median flux for the component light curve.
 appendto (lcdict or None) – If appendto is an lcdict, will append measurements of this lcdict to that lcdict. This is used for consolidating light curves for the same object across different files (sectors/cameras/CCDs?). The appending does not care about the time order. To consolidate light curves in time order, use consolidate_tess_fitslc below.
 filterqualityflags (bool) – If True, will remove any measurements that have nonzero quality flags present. This usually indicates an issue with the instrument or spacecraft.
 nanfilter ({'sap','pdc','sap,pdc'} or None) – Indicates the flux measurement type(s) to apply the filtering to.
 timestoignore (list of tuples or None) –
This is of the form:
[(time1_start, time1_end), (time2_start, time2_end), ...]
and indicates the start and end times to mask out of the final lcdict. Use this to remove anything that wasn’t caught by the quality flags.
Returns: Returns an lcdict (this is useable by most astrobase functions for LC processing).
Return type: lcdict

astrobase.astrotess.
consolidate_tess_fitslc
(lclist, normalize=True, filterqualityflags=False, nanfilter=None, timestoignore=None, headerkeys=['EXPOSURE', 'TIMEREF', 'TASSIGN', 'TIMESYS', 'BJDREFI', 'BJDREFF', 'TELAPSE', 'LIVETIME', 'INT_TIME', 'NUM_FRM', 'TIMEDEL', 'BACKAPP', 'DEADAPP', 'VIGNAPP', 'GAINA', 'GAINB', 'GAINC', 'GAIND', 'READNOIA', 'READNOIB', 'READNOIC', 'READNOID', 'CDPP0_5', 'CDPP1_0', 'CDPP2_0', 'PDCVAR', 'PDCMETHD', 'CROWDSAP', 'FLFRCSAP', 'NSPSDDET', 'NSPSDCOR'], datakeys=['TIME', 'TIMECORR', 'CADENCENO', 'QUALITY', 'PSF_CENTR1', 'PSF_CENTR1_ERR', 'PSF_CENTR2', 'PSF_CENTR2_ERR', 'MOM_CENTR1', 'MOM_CENTR1_ERR', 'MOM_CENTR2', 'MOM_CENTR2_ERR', 'POS_CORR1', 'POS_CORR2'], sapkeys=['SAP_FLUX', 'SAP_FLUX_ERR', 'SAP_BKG', 'SAP_BKG_ERR'], pdckeys=['PDCSAP_FLUX', 'PDCSAP_FLUX_ERR'], topkeys=['DATEOBS', 'DATEEND', 'PROCVER', 'ORIGIN', 'DATA_REL', 'TIMVERSN', 'OBJECT', 'TICID', 'SECTOR', 'CAMERA', 'CCD', 'PXTABLE', 'RADESYS', 'RA_OBJ', 'DEC_OBJ', 'EQUINOX', 'PMRA', 'PMDEC', 'PMTOTAL', 'TESSMAG', 'TEFF', 'LOGG', 'MH', 'RADIUS', 'TICVER', 'CRMITEN', 'CRBLKSZ', 'CRSPOC'], apkeys=['NPIXSAP', 'NPIXMISS', 'CDELT1', 'CDELT2'])[source]¶ This consolidates a list of LCs for a single TIC object.
NOTE: if light curve time arrays contain nans, these and their associated measurements will be sorted to the end of the final combined arrays.
Parameters:  lclist (list of str, or str) – lclist is either a list of actual light curve files or a string that is valid for glob.glob to search for and generate a light curve list based on the file glob. This is useful for consolidating LC FITS files across different TESS sectors for a single TIC ID using a glob like *<TICID>*_lc.fits.
 normalize (bool) – If True, then the light curve’s SAP_FLUX and PDCSAP_FLUX measurements will be normalized to 1.0 by dividing out the median flux for the component light curve.
 filterqualityflags (bool) – If True, will remove any measurements that have nonzero quality flags present. This usually indicates an issue with the instrument or spacecraft.
 nanfilter ({'sap','pdc','sap,pdc'} or None) – Indicates the flux measurement type(s) to apply the filtering to.
 timestoignore (list of tuples or None) –
This is of the form:
[(time1_start, time1_end), (time2_start, time2_end), ...]
and indicates the start and end times to mask out of the final lcdict. Use this to remove anything that wasn’t caught by the quality flags.
 headerkeys (list) – A list of FITS header keys that will be extracted from the FITS light curve file. These describe the observations. The default value for this is given in LCHEADERKEYS above.
 datakeys (list) – A list of FITS column names that correspond to the auxiliary measurements in the light curve. The default is LCDATAKEYS above.
 sapkeys (list) – A list of FITS column names that correspond to the SAP flux measurements in the light curve. The default is LCSAPKEYS above.
 pdckeys (list) – A list of FITS column names that correspond to the PDC flux measurements in the light curve. The default is LCPDCKEYS above.
 topkeys (list) – A list of FITS header keys that describe the object in the light curve. The default is LCTOPKEYS above.
 apkeys (list) – A list of FITS header keys that describe the flux measurement apertures used by the TESS pipeline. The default is LCAPERTUREKEYS above.
Returns: Returns an lcdict (this is useable by most astrobase functions for LC processing).
Return type: lcdict

astrobase.astrotess.
tess_lcdict_to_pkl
(lcdict, outfile=None)[source]¶ This writes the lcdict to a Python pickle.
Parameters:  lcdict (lcdict) – This is the input lcdict to write to a pickle.
 outfile (str or None) – If this is None, the object’s Kepler ID/EPIC ID will determined from the lcdict and used to form the filename of the output pickle file. If this is a str, the provided filename will be used.
Returns: The absolute path to the written pickle file.
Return type: str

astrobase.astrotess.
read_tess_pklc
(picklefile)[source]¶ This turns the pickled lightcurve file back into an lcdict.
Parameters: picklefile (str) – The path to a previously written Kepler LC picklefile generated by tess_lcdict_to_pkl above. Returns: Returns an lcdict (this is useable by most astrobase functions for LC processing). Return type: lcdict

astrobase.astrotess.
filter_tess_lcdict
(lcdict, filterqualityflags=True, nanfilter='sap, pdc, time', timestoignore=None, quiet=False)[source]¶ This filters the provided TESS lcdict, removing nans and bad observations.
By default, this function removes points in the TESS LC that have ANY quality flags set.
Parameters:  lcdict (lcdict) – An lcdict produced by consolidate_tess_fitslc or read_tess_fitslc.
 filterflags (bool) – If True, will remove any measurements that have nonzero quality flags present. This usually indicates an issue with the instrument or spacecraft.
 nanfilter ({'sap','pdc','sap,pdc'}) – Indicates the flux measurement type(s) to apply the filtering to.
 timestoignore (list of tuples or None) –
This is of the form:
[(time1_start, time1_end), (time2_start, time2_end), ...]
and indicates the start and end times to mask out of the final lcdict. Use this to remove anything that wasn’t caught by the quality flags.
Returns: Returns an lcdict (this is useable by most astrobase functions for LC processing). The lcdict is filtered IN PLACE!
Return type: lcdict
astrobase.hatsurveys package¶
Submodules¶
astrobase.hatsurveys.hatlc module¶
This contains functions to read HAT sqlite (“sqlitecurves”) and CSV light curves generated by the new HAT data server.
The most useful functions in this module are:
read_csvlc(lcfile):
This reads a CSV light curve produced by the HAT data server into an
lcdict.
lcfile is the HAT gzipped CSV LC (with a .hatlc.csv.gz extension)
And:
read_and_filter_sqlitecurve(lcfile, columns=None, sqlfilters=None,
raiseonfail=False, forcerecompress=False):
This reads a sqlitecurve file and optionally filters it, returns an
lcdict.
Returns columns requested in columns. If None, then returns all columns
present in the latest columnlist in the lightcurve. See COLUMNDEFS for
the full list of HAT LC columns.
If sqlfilters is not None, it must be a list of text SQL filters that
apply to the columns in the lightcurve.
This returns an lcdict with an added 'lcfiltersql' key that indicates
what the parsed SQL filter string was.
If forcerecompress = True, will recompress the ungzipped sqlitecurve
even if the gzipped form exists on disk already.
Finally:
describe(lcdict):
This describes the metadata of the light curve.
Command line usage¶
You can call this module directly from the command line:
If you just have this file alone:
$ chmod +x hatlc.py
$ ./hatlc.py help
If astrobase is installed with pip, etc., this will be on your path already:
$ hatlc help
These should give you the following:
usage: hatlc.py [h] [describe] hatlcfile
read a HAT LC of any format and output to stdout
positional arguments:
hatlcfile path to the light curve you want to read and pipe to stdout
optional arguments:
h, help show this help message and exit
describe don't dump the columns, show only object info and LC metadata
Either one will dump any HAT LC recognized to stdout (or just dump the description if requested).
Other useful functions¶
Two other functions that might be useful:
normalize_lcdict(lcdict, timecol='rjd', magcols='all', mingap=4.0,
normto='sdssr', debugmode=False):
This normalizes magnitude columns (specified in the magcols keyword
argument) in an lcdict obtained from reading a HAT light curve. This
normalization is done by finding 'timegroups' in each magnitude column,
assuming that these belong to different 'eras' separated by a specified
gap in the mingap keyword argument, and thus may be offset vertically
from one another. Measurements within a timegroup are normalized to zero
using the meidan magnitude of the timegroup. Once all timegroups have
been processed this way, the whole time series is then renormalized to
the specified value in the normto keyword argument.
And:
normalize_lcdict_byinst(lcdict, magcols='all', normto='sdssr',
normkeylist=('stf','ccd','flt','fld','prj','exp'),
debugmode=False)
This normalizes magnitude columns (specified in the magcols keyword
argument) in an lcdict obtained from reading a HAT light curve. This
normalization is done by generating a normalization key using columns in
the lcdict that specify various instrument properties. The default
normalization key (specified in the normkeylist kwarg) is a combination
of:
 HAT station IDs ('stf')
 camera position ID ('ccd'; useful for HATSouth observations)
 camera filters ('flt')
 observed HAT field names ('fld')
 HAT project IDs ('prj')
 camera exposure times ('exp')
with the assumption that measurements with identical normalization keys
belong to a single 'era'. Measurements within an era are normalized to
zero using the median magnitude of the era. Once all eras have been
processed this way, the whole time series is then renormalized to the
specified value in the normto keyword argument.
There’s an IPython notebook describing the use of this module and accompanying modules from the astrobase package at:
https://github.com/waqasbhatti/astrobasenotebooks/blob/master/lightcurvework.ipynb

astrobase.hatsurveys.hatlc.
read_and_filter_sqlitecurve
(lcfile, columns=None, sqlfilters=None, raiseonfail=False, returnarrays=True, forcerecompress=False, quiet=True)[source]¶ This reads a HAT sqlitecurve and optionally filters it.
Parameters:  lcfile (str) – The path to the HAT sqlitecurve file.
 columns (list) – A list of columns to extract from the ligh curve file. If None, then returns all columns present in the latest columnlist in the light curve.
 sqlfilters (list of str) – If no None, it must be a list of text SQL filters that apply to the columns in the lightcurve.
 raiseonfail (bool) – If this is True, an Exception when reading the LC will crash the function instead of failing silently and returning None as the result.
 returnarrays (bool) – If this is True, the output lcdict contains columns as np.arrays instead of lists. You generally want this to be True.
 forcerecompress (bool) – If True, the sqlitecurve will be recompressed even if a compressed version of it is found. This usually happens when sqlitecurve opening is interrupted by the OS for some reason, leaving behind a gzipped and ungzipped copy. By default, this function refuses to overwrite the existing gzipped version so if the ungzipped version is corrupt but that one isn’t, it can be safely recovered.
 quiet (bool) – If True, will not warn about any problems, even if the light curve reading fails (the only clue then will be the return value of None). Useful for batch processing of many many light curves.
Returns: tuple – A twoelement tuple is returned, with the first element being the lcdict.
Return type: (lcdict, status_message)

astrobase.hatsurveys.hatlc.
describe
(lcdict, returndesc=False, offsetwith=None)[source]¶ This describes the light curve object and columns present.
Parameters:  lcdict (dict) – The input lcdict to parse for column and metadata info.
 returndesc (bool) – If True, returns the description string as an str instead of just printing it to stdout.
 offsetwith (str) – This is a character to offset the output description lines by. This is useful to add comment characters like ‘#’ to the output description lines.
Returns: If returndesc is True, returns the description lines as a str, otherwise returns nothing.
Return type: str or None

astrobase.hatsurveys.hatlc.
read_lcc_csvlc
(lcfile)[source]¶ This reads a CSV LC produced by an LCCServer instance.
Parameters: lcfile (str) – The LC file to read. Returns: Returns an lcdict that’s readable by most astrobase functions for further processing. Return type: dict

astrobase.hatsurveys.hatlc.
describe_lcc_csv
(lcdict, returndesc=False)[source]¶ This describes the LCC CSV format light curve file.
Parameters:  lcdict (dict) – The input lcdict to parse for column and metadata info.
 returndesc (bool) – If True, returns the description string as an str instead of just printing it to stdout.
Returns: If returndesc is True, returns the description lines as a str, otherwise returns nothing.
Return type: str or None

astrobase.hatsurveys.hatlc.
read_csvlc
(lcfile)[source]¶ This reads a HAT data server or LCCServer produced CSV light curve into an lcdict.
This will automatically figure out the format of the file provided. Currently, it can read:
 legacy HAT data server CSV LCs (e.g. from https://hatsouth.org/planets/lightcurves.html) with an extension of the form: .hatlc.csv.gz.
 all LCCServer produced LCCCSVV1 LCs (e.g. from https://data.hatsurveys.org) with an extension of the form: csvlc.gz.
Parameters: lcfile (str) – The light curve file to read. Returns: Returns an lcdict that can be read and used by many astrobase processing functions. Return type: dict

astrobase.hatsurveys.hatlc.
find_lc_timegroups
(lctimes, mingap=4.0)[source]¶ This finds the time gaps in the light curve, so we can figure out which times are for consecutive observations and which represent gaps between seasons.
Parameters:  lctimes (np.array) – This is the input array of times, assumed to be in some form of JD.
 mingap (float) – This defines how much the difference between consecutive measurements is allowed to be to consider them as parts of different timegroups. By default it is set to 4.0 days.
Returns: A tuple of the form below is returned, containing the number of time groups found and Python slice objects for each group:
(ngroups, [slice(start_ind_1, end_ind_1), ...])
Return type: tuple

astrobase.hatsurveys.hatlc.
normalize_lcdict
(lcdict, timecol='rjd', magcols='all', mingap=4.0, normto='sdssr', debugmode=False, quiet=False)[source]¶ This normalizes magcols in lcdict using timecol to find timegroups.
Parameters:  lcdict (dict) – The input lcdict to process.
 timecol (str) – The key in the lcdict that is to be used to extract the time column.
 magcols ('all' or list of str) – If this is ‘all’, all of the columns in the lcdict that are indicated to be magnitude measurement columns are normalized. If this is a list of str, must contain the keys of the lcdict specifying which magnitude columns will be normalized.
 mingap (float) – This defines how much the difference between consecutive measurements is allowed to be to consider them as parts of different timegroups. By default it is set to 4.0 days.
 normto ({'globalmedian', 'zero', 'jmag', 'hmag', 'kmag', 'bmag', 'vmag', 'sdssg', 'sdssr', 'sdssi'}) – This indicates which column will be the normalization target. If this is ‘globalmedian’, the normalization will be to the global median of each LC column. If this is ‘zero’, will normalize to 0.0 for each LC column. Otherwise, will normalize to the value of one of the other keys in the lcdict[‘objectinfo’][magkey], meaning the normalization will be to some form of catalog magnitude.
 debugmode (bool) – If True, will indicate progress as timegroups are found and processed.
 quiet (bool) – If True, will not emit any messages when processing.
Returns: Returns the lcdict with the magnitude measurements normalized as specified. The normalization happens IN PLACE.
Return type: dict

astrobase.hatsurveys.hatlc.
normalize_lcdict_byinst
(lcdict, magcols='all', normto='sdssr', normkeylist=('stf', 'ccd', 'flt', 'fld', 'prj', 'exp'), debugmode=False, quiet=False)[source]¶ This is a function to normalize light curves across all instrument combinations present.
Use this to normalize a light curve containing a variety of:
 HAT station IDs (‘stf’)
 camera IDs (‘ccd’)
 filters (‘flt’)
 observed field names (‘fld’)
 HAT project IDs (‘prj’)
 exposure times (‘exp’)
Parameters:  lcdict (dict) – The input lcdict to process.
 magcols ('all' or list of str) – If this is ‘all’, all of the columns in the lcdict that are indicated to be magnitude measurement columns are normalized. If this is a list of str, must contain the keys of the lcdict specifying which magnitude columns will be normalized.
 normto ({'zero', 'jmag', 'hmag', 'kmag', 'bmag', 'vmag', 'sdssg', 'sdssr', 'sdssi'}) – This indicates which column will be the normalization target. If this is ‘zero’, will normalize to 0.0 for each LC column. Otherwise, will normalize to the value of one of the other keys in the lcdict[‘objectinfo’][magkey], meaning the normalization will be to some form of catalog magnitude.
 normkeylist (list of str) – These are the column keys to use to form the normalization index. Measurements in the specified magcols with identical normalization index values will be considered as part of a single measurement ‘era’, and will be normalized to zero. Once all eras have been normalized this way, the final light curve will be renormalized as specified in normto.
 debugmode (bool) – If True, will indicate progress as timegroups are found and processed.
 quiet (bool) – If True, will not emit any messages when processing.
Returns: Returns the lcdict with the magnitude measurements normalized as specified. The normalization happens IN PLACE.
Return type: dict

astrobase.hatsurveys.hatlc.
main
()[source]¶ This is called when we’re executed from the commandline.
The current usage from the commandline is described below:
usage: hatlc [h] [describe] hatlcfile read a HAT LC of any format and output to stdout positional arguments: hatlcfile path to the light curve you want to read and pipe to stdout optional arguments: h, help show this help message and exit describe don't dump the columns, show only object info and LC metadata
astrobase.hatsurveys.k2hat module¶
This contains functions for reading K2 CSV lightcurves produced by the HAT Project into a Python dictionary. Requires numpy.
The only external function here is:
read_csv_lightcurve(lcfile)
Example:
Reading the best aperture LC for EPIC201183188 = UCAC4428055298 (see http://k2.hatsurveys.org to search for this object and download the light curve):
>>> import k2hat
>>> lcdict = k2hat.read_csv_lightcurve('UCAC442805529875d3f4357b314ff5ac458e917e6dfeb964877b60affe9193d4f65088k2lc.csv.gz')
The Python dict lcdict contains the metadata and all columns.
>>> lcdict.keys()
['decl', 'objectid', 'bjdoffset', 'qualflag', 'fovchannel', 'BGV',
'aperpixradius', 'IM04', 'TF17', 'EP01', 'CF01', 'ra', 'fovmodule', 'columns',
'k2campaign', 'EQ01', 'fovccd', 'FRN', 'IE04', 'kepid', 'YCC', 'XCC', 'BJD',
'napertures', 'ucac4id', 'IQ04', 'kepmag', 'ndet','kernelspec']
The columns for the light curve are stored in the columns key of the dict. To get a list of the columns:
>>> lcdict['columns']
['BJD', 'BGV', 'FRN', 'XCC', 'YCC', 'IM04', 'IE04', 'IQ04', 'EP01', 'EQ01',
'TF17', 'CF01']
To get columns:
>>> bjd, epdmags = lcdict['BJD'], lcdict['EP01']
>>> bjd
array([ 2456808.1787283, 2456808.1991608, 2456808.2195932, ...,
2456890.2535691, 2456890.274001 , 2456890.2944328])
>>> epdmags
array([ 16.03474, 16.02773, 16.01826, ..., 15.76997, 15.76577,
15.76263])
astrobase.astrokep
: contains functions for dealing with Kepler and K2 Mission light curves from STScI MAST (reading the FITS files, consolidating light curves for objects over quarters), and some basic operations (converting fluxes to mags, decorrelation of light curves, filtering light curves, and fitting object centroids for eclipse analysis, etc.)astrobase.astrotess
: contains functions for dealing with TESS 2minute cadence light curves from STScI MAST (reading the FITS files, consolidating light curves for objects over sectors), and some basic operations (converting fluxes to mags, filtering light curves, etc.)astrobase.hatsurveys
: modules to read, filter, and normalize light curves from various HAT surveys.
astrobase.periodbase.abls module¶
Contains the Kovacs, et al. (2002) BoxLeastsquaredSearch periodsearch algorithm implementation for periodbase. This uses the implementation in Astropy 3.1, so requires that version.

astrobase.periodbase.abls.
bls_serial_pfind
(times, mags, errs, magsarefluxes=False, startp=0.1, endp=100.0, stepsize=0.0005, mintransitduration=0.01, maxtransitduration=0.4, ndurations=100, autofreq=True, blsobjective='likelihood', blsmethod='fast', blsoversample=10, blsmintransits=3, blsfreqfactor=10.0, periodepsilon=0.1, nbestpeaks=5, sigclip=10.0, endp_timebase_check=True, verbose=True, raiseonfail=False)[source]¶ Runs the Box Least Squares Fitting Search for transitshaped signals.
Based on the version of BLS in Astropy 3.1: astropy.stats.BoxLeastSquares. If you don’t have Astropy 3.1, this module will fail to import. Note that by default, this implementation of bls_serial_pfind doesn’t use the .autoperiod() function from BoxLeastSquares but uses the same auto frequencygrid generation as the functions in periodbase.kbls. If you want to use Astropy’s implementation, set the value of autofreq kwarg to ‘astropy’.
The dict returned from this function contains a blsmodel key, which is the generated model from Astropy’s BLS. Use the .compute_stats() method to calculate the required stats like SNR, depth, duration, etc.
Parameters:  times,mags,errs (np.array) – The magnitude/flux timeseries to search for transits.
 magsarefluxes (bool) – If the input measurement values in mags and errs are in fluxes, set this to True.
 startp,endp (float) – The minimum and maximum periods to consider for the transit search.
 stepsize (float) – The stepsize in frequency to use when constructing a frequency grid for the period search.
 mintransitduration,maxtransitduration (float) – The minimum and maximum transitdurations (in units of phase) to consider for the transit search.
 ndurations (int) – The number of transit durations to use in the periodsearch.
 autofreq (bool or str) –
If this is True, the values of stepsize and nphasebins will be ignored, and these, along with a frequencygrid, will be determined based on the following relations:
nphasebins = int(ceil(2.0/mintransitduration)) if nphasebins > 3000: nphasebins = 3000 stepsize = 0.25*mintransitduration/(times.max()times.min()) minfreq = 1.0/endp maxfreq = 1.0/startp nfreq = int(ceil((maxfreq  minfreq)/stepsize))
If this is False, you must set startp, endp, and stepsize as appropriate.
If this is str == ‘astropy’, will use the astropy.stats.BoxLeastSquares.autoperiod() function to calculate the frequency grid instead of the kbls method.
 blsobjective ({'likelihood','snr'}) – Sets the type of objective to optimize in the BoxLeastSquares.power() function.
 blsmethod ({'fast','slow'}) – Sets the type of method to use in the BoxLeastSquares.power() function.
 blsoversample ({'likelihood','snr'}) – Sets the oversample kwarg for the BoxLeastSquares.power() function.
 blsmintransits (int) – Sets the min_n_transits kwarg for the BoxLeastSquares.autoperiod() function.
 blsfreqfactor (float) – Sets the frequency_factor kwarg for the BoxLeastSquares.autperiod() function.
 periodepsilon (float) – The fractional difference between successive values of ‘best’ periods when sorting by periodogram power to consider them as separate periods (as opposed to part of the same periodogram peak). This is used to avoid broad peaks in the periodogram and make sure the ‘best’ periods returned are all actually independent.
 nbestpeaks (int) – The number of ‘best’ peaks to return from the periodogram results, starting from the global maximum of the periodogram peak values.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 endp_timebase_check (bool) – If True, will check if the
endp
value is larger than the timebase of the observations. If it is, will change theendp
value such that it is half of the timebase. If False, will allow anendp
larger than the timebase of the observations.  verbose (bool) – If this is True, will indicate progress and details about the frequency grid used for the period search.
 raiseonfail (bool) – If True, raises an exception if something goes wrong. Otherwise, returns None.
Returns: This function returns a dict, referred to as an lspinfo dict in other astrobase functions that operate on periodogram results. This is a standardized format across all astrobase periodfinders, and is of the form below:
{'bestperiod': the best period value in the periodogram, 'bestlspval': the periodogram peak associated with the best period, 'nbestpeaks': the input value of nbestpeaks, 'nbestlspvals': nbestpeakssize list of best period peak values, 'nbestperiods': nbestpeakssize list of best periods, 'lspvals': the full array of periodogram powers, 'frequencies': the full array of frequencies considered, 'periods': the full array of periods considered, 'durations': the array of durations used to run BLS, 'blsresult': Astropy BLS result object (BoxLeastSquaresResult), 'blsmodel': Astropy BLS BoxLeastSquares object used for work, 'stepsize': the actual stepsize used, 'nfreq': the actual nfreq used, 'durations': the durations array used, 'mintransitduration': the input mintransitduration, 'maxtransitduration': the input maxtransitdurations, 'method':'bls' > the name of the periodfinder method, 'kwargs':{ dict of all of the input kwargs for recordkeeping}}
Return type: dict

astrobase.periodbase.abls.
bls_parallel_pfind
(times, mags, errs, magsarefluxes=False, startp=0.1, endp=100.0, stepsize=0.0001, mintransitduration=0.01, maxtransitduration=0.4, ndurations=100, autofreq=True, blsobjective='likelihood', blsmethod='fast', blsoversample=5, blsmintransits=3, blsfreqfactor=10.0, nbestpeaks=5, periodepsilon=0.1, sigclip=10.0, endp_timebase_check=True, verbose=True, nworkers=None)[source]¶ Runs the Box Least Squares Fitting Search for transitshaped signals.
Breaks up the full frequency space into chunks and passes them to parallel BLS workers.
Based on the version of BLS in Astropy 3.1: astropy.stats.BoxLeastSquares. If you don’t have Astropy 3.1, this module will fail to import. Note that by default, this implementation of bls_parallel_pfind doesn’t use the .autoperiod() function from BoxLeastSquares but uses the same auto frequencygrid generation as the functions in periodbase.kbls. If you want to use Astropy’s implementation, set the value of autofreq kwarg to ‘astropy’. The generated period array will then be broken up into chunks and sent to the individual workers.
NOTE: the combined BLS spectrum produced by this function is not identical to that produced by running BLS in one shot for the entire frequency space. There are differences on the order of 1.0e3 or so in the respective peak values, but peaks appear at the same frequencies for both methods. This is likely due to different aliasing caused by smaller chunks of the frequency space used by the parallel workers in this function. When in doubt, confirm results for this parallel implementation by comparing to those from the serial implementation above.
In particular, when you want to get reliable estimates of the SNR, transit depth, duration, etc. that Astropy’s BLS gives you, rerun bls_serial_pfind with startp, and endp close to the best period you want to characterize the transit at. The dict returned from that function contains a blsmodel key, which is the generated model from Astropy’s BLS. Use the .compute_stats() method to calculate the required stats.
Parameters:  times,mags,errs (np.array) – The magnitude/flux timeseries to search for transits.
 magsarefluxes (bool) – If the input measurement values in mags and errs are in fluxes, set this to True.
 startp,endp (float) – The minimum and maximum periods to consider for the transit search.
 stepsize (float) – The stepsize in frequency to use when constructing a frequency grid for the period search.
 mintransitduration,maxtransitduration (float) – The minimum and maximum transitdurations (in units of phase) to consider for the transit search.
 ndurations (int) – The number of transit durations to use in the periodsearch.
 autofreq (bool or str) –
If this is True, the values of stepsize and nphasebins will be ignored, and these, along with a frequencygrid, will be determined based on the following relations:
nphasebins = int(ceil(2.0/mintransitduration)) if nphasebins > 3000: nphasebins = 3000 stepsize = 0.25*mintransitduration/(times.max()times.min()) minfreq = 1.0/endp maxfreq = 1.0/startp nfreq = int(ceil((maxfreq  minfreq)/stepsize))
If this is False, you must set startp, endp, and stepsize as appropriate.
If this is str == ‘astropy’, will use the astropy.stats.BoxLeastSquares.autoperiod() function to calculate the frequency grid instead of the kbls method.
 blsobjective ({'likelihood','snr'}) – Sets the type of objective to optimize in the BoxLeastSquares.power() function.
 blsmethod ({'fast','slow'}) – Sets the type of method to use in the BoxLeastSquares.power() function.
 blsoversample ({'likelihood','snr'}) – Sets the oversample kwarg for the BoxLeastSquares.power() function.
 blsmintransits (int) – Sets the min_n_transits kwarg for the BoxLeastSquares.autoperiod() function.
 blsfreqfactor (float) – Sets the frequency_factor kwarg for the BoxLeastSquares.autoperiod() function.
 periodepsilon (float) – The fractional difference between successive values of ‘best’ periods when sorting by periodogram power to consider them as separate periods (as opposed to part of the same periodogram peak). This is used to avoid broad peaks in the periodogram and make sure the ‘best’ periods returned are all actually independent.
 nbestpeaks (int) – The number of ‘best’ peaks to return from the periodogram results, starting from the global maximum of the periodogram peak values.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 endp_timebase_check (bool) – If True, will check if the
endp
value is larger than the timebase of the observations. If it is, will change theendp
value such that it is half of the timebase. If False, will allow anendp
larger than the timebase of the observations.  verbose (bool) – If this is True, will indicate progress and details about the frequency grid used for the period search.
 nworkers (int or None) – The number of parallel workers to launch for periodsearch. If None, nworkers = NCPUS.
Returns: This function returns a dict, referred to as an lspinfo dict in other astrobase functions that operate on periodogram results. This is a standardized format across all astrobase periodfinders, and is of the form below:
{'bestperiod': the best period value in the periodogram, 'bestlspval': the periodogram peak associated with the best period, 'nbestpeaks': the input value of nbestpeaks, 'nbestlspvals': nbestpeakssize list of best period peak values, 'nbestperiods': nbestpeakssize list of best periods, 'lspvals': the full array of periodogram powers, 'frequencies': the full array of frequencies considered, 'periods': the full array of periods considered, 'durations': the array of durations used to run BLS, 'blsresult': Astropy BLS result object (BoxLeastSquaresResult), 'blsmodel': Astropy BLS BoxLeastSquares object used for work, 'stepsize': the actual stepsize used, 'nfreq': the actual nfreq used, 'durations': the durations array used, 'mintransitduration': the input mintransitduration, 'maxtransitduration': the input maxtransitdurations, 'method':'bls' > the name of the periodfinder method, 'kwargs':{ dict of all of the input kwargs for recordkeeping}}
Return type: dict
astrobase.periodbase.kbls module¶
Contains the Kovacs, et al. (2002) BoxLeastsquaredSearch periodsearch algorithm implementation for periodbase.

astrobase.periodbase.kbls.
bls_serial_pfind
(times, mags, errs, magsarefluxes=False, startp=0.1, endp=100.0, stepsize=0.0005, mintransitduration=0.01, maxtransitduration=0.4, nphasebins=200, autofreq=True, periodepsilon=0.1, nbestpeaks=5, sigclip=10.0, endp_timebase_check=True, verbose=True, get_stats=True)[source]¶ Runs the Box Least Squares Fitting Search for transitshaped signals.
Based on eebls.f from Kovacs et al. 2002 and pythonbls from ForemanMackey et al. 2015. This is the serial version (which is good enough in most cases because BLS in Fortran is fairly fast). If nfreq > 5e5, this will take a while.
Parameters:  times,mags,errs (np.array) – The magnitude/flux timeseries to search for transits.
 magsarefluxes (bool) – If the input measurement values in mags and errs are in fluxes, set this to True.
 startp,endp (float) – The minimum and maximum periods to consider for the transit search.
 stepsize (float) – The stepsize in frequency to use when constructing a frequency grid for the period search.
 mintransitduration,maxtransitduration (float) – The minimum and maximum transitdurations (in units of phase) to consider for the transit search.
 nphasebins (int) – The number of phase bins to use in the period search.
 autofreq (bool) –
If this is True, the values of stepsize and nphasebins will be ignored, and these, along with a frequencygrid, will be determined based on the following relations:
nphasebins = int(ceil(2.0/mintransitduration)) if nphasebins > 3000: nphasebins = 3000 stepsize = 0.25*mintransitduration/(times.max()times.min()) minfreq = 1.0/endp maxfreq = 1.0/startp nfreq = int(ceil((maxfreq  minfreq)/stepsize))
 periodepsilon (float) – The fractional difference between successive values of ‘best’ periods when sorting by periodogram power to consider them as separate periods (as opposed to part of the same periodogram peak). This is used to avoid broad peaks in the periodogram and make sure the ‘best’ periods returned are all actually independent.
 nbestpeaks (int) – The number of ‘best’ peaks to return from the periodogram results, starting from the global maximum of the periodogram peak values.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 endp_timebase_check (bool) – If True, will check if the
endp
value is larger than the timebase of the observations. If it is, will change theendp
value such that it is half of the timebase. If False, will allow anendp
larger than the timebase of the observations.  verbose (bool) – If this is True, will indicate progress and details about the frequency grid used for the period search.
 get_stats (bool) –
If True, runs
bls_stats_singleperiod()
for each of the best periods in the output and injects the output into the output dict so you only have to run this function to get the periods and their stats.The output dict from this function will then contain a ‘stats’ key containing a list of dicts with statistics for each period in
resultdict['nbestperiods']
. These dicts will contain fit values of transit parameters after a trapezoid transit model is fit to the phased light curve at each period inresultdict['nbestperiods']
, i.e. fit values for period, epoch, transit depth, duration, ingress duration, and the SNR of the transit.NOTE: make sure to check the ‘fit_status’ key for each
resultdict['stats']
item to confirm that the trapezoid transit model fit succeeded and that the stats calculated are valid.
Returns: This function returns a dict, referred to as an lspinfo dict in other astrobase functions that operate on periodogram results. This is a standardized format across all astrobase periodfinders, and is of the form below:
{'bestperiod': the best period value in the periodogram, 'bestlspval': the periodogram peak associated with the best period, 'nbestpeaks': the input value of nbestpeaks, 'nbestlspvals': nbestpeakssize list of best period peak values, 'nbestperiods': nbestpeakssize list of best periods, 'stats': BLS stats for each best period, 'lspvals': the full array of periodogram powers, 'frequencies': the full array of frequencies considered, 'periods': the full array of periods considered, 'blsresult': the result dict from the eebls.f wrapper function, 'stepsize': the actual stepsize used, 'nfreq': the actual nfreq used, 'nphasebins': the actual nphasebins used, 'mintransitduration': the input mintransitduration, 'maxtransitduration': the input maxtransitdurations, 'method':'bls' > the name of the periodfinder method, 'kwargs':{ dict of all of the input kwargs for recordkeeping}}
Return type: dict

astrobase.periodbase.kbls.
bls_parallel_pfind
(times, mags, errs, magsarefluxes=False, startp=0.1, endp=100.0, stepsize=0.0001, mintransitduration=0.01, maxtransitduration=0.4, nphasebins=200, autofreq=True, nbestpeaks=5, periodepsilon=0.1, sigclip=10.0, endp_timebase_check=True, verbose=True, nworkers=None, get_stats=True)[source]¶ Runs the Box Least Squares Fitting Search for transitshaped signals.
Based on eebls.f from Kovacs et al. 2002 and pythonbls from ForemanMackey et al. 2015. Breaks up the full frequency space into chunks and passes them to parallel BLS workers.
NOTE: the combined BLS spectrum produced by this function is not identical to that produced by running BLS in one shot for the entire frequency space. There are differences on the order of 1.0e3 or so in the respective peak values, but peaks appear at the same frequencies for both methods. This is likely due to different aliasing caused by smaller chunks of the frequency space used by the parallel workers in this function. When in doubt, confirm results for this parallel implementation by comparing to those from the serial implementation above.
Parameters:  times,mags,errs (np.array) – The magnitude/flux timeseries to search for transits.
 magsarefluxes (bool) – If the input measurement values in mags and errs are in fluxes, set this to True.
 startp,endp (float) – The minimum and maximum periods to consider for the transit search.
 stepsize (float) – The stepsize in frequency to use when constructing a frequency grid for the period search.
 mintransitduration,maxtransitduration (float) – The minimum and maximum transitdurations (in units of phase) to consider for the transit search.
 nphasebins (int) – The number of phase bins to use in the period search.
 autofreq (bool) –
If this is True, the values of stepsize and nphasebins will be ignored, and these, along with a frequencygrid, will be determined based on the following relations:
nphasebins = int(ceil(2.0/mintransitduration)) if nphasebins > 3000: nphasebins = 3000 stepsize = 0.25*mintransitduration/(times.max()times.min()) minfreq = 1.0/endp maxfreq = 1.0/startp nfreq = int(ceil((maxfreq  minfreq)/stepsize))
 periodepsilon (float) – The fractional difference between successive values of ‘best’ periods when sorting by periodogram power to consider them as separate periods (as opposed to part of the same periodogram peak). This is used to avoid broad peaks in the periodogram and make sure the ‘best’ periods returned are all actually independent.
 nbestpeaks (int) – The number of ‘best’ peaks to return from the periodogram results, starting from the global maximum of the periodogram peak values.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 endp_timebase_check (bool) – If True, will check if the
endp
value is larger than the timebase of the observations. If it is, will change theendp
value such that it is half of the timebase. If False, will allow anendp
larger than the timebase of the observations.  verbose (bool) – If this is True, will indicate progress and details about the frequency grid used for the period search.
 nworkers (int or None) – The number of parallel workers to launch for periodsearch. If None, nworkers = NCPUS.
 get_stats (bool) –
If True, runs
bls_stats_singleperiod()
for each of the best periods in the output and injects the output into the output dict so you only have to run this function to get the periods and their stats.The output dict from this function will then contain a ‘stats’ key containing a list of dicts with statistics for each period in
resultdict['nbestperiods']
. These dicts will contain fit values of transit parameters after a trapezoid transit model is fit to the phased light curve at each period inresultdict['nbestperiods']
, i.e. fit values for period, epoch, transit depth, duration, ingress duration, and the SNR of the transit.NOTE: make sure to check the ‘fit_status’ key for each
resultdict['stats']
item to confirm that the trapezoid transit model fit succeeded and that the stats calculated are valid.
Returns: This function returns a dict, referred to as an lspinfo dict in other astrobase functions that operate on periodogram results. This is a standardized format across all astrobase periodfinders, and is of the form below:
{'bestperiod': the best period value in the periodogram, 'bestlspval': the periodogram peak associated with the best period, 'nbestpeaks': the input value of nbestpeaks, 'nbestlspvals': nbestpeakssize list of best period peak values, 'nbestperiods': nbestpeakssize list of best periods, 'stats': list of stats dicts returned for each best period, 'lspvals': the full array of periodogram powers, 'frequencies': the full array of frequencies considered, 'periods': the full array of periods considered, 'blsresult': list of result dicts from eebls.f wrapper functions, 'stepsize': the actual stepsize used, 'nfreq': the actual nfreq used, 'nphasebins': the actual nphasebins used, 'mintransitduration': the input mintransitduration, 'maxtransitduration': the input maxtransitdurations, 'method':'bls' > the name of the periodfinder method, 'kwargs':{ dict of all of the input kwargs for recordkeeping}}
Return type: dict

astrobase.periodbase.kbls.
bls_stats_singleperiod
(times, mags, errs, period, magsarefluxes=False, sigclip=10.0, perioddeltapercent=10, nphasebins=200, mintransitduration=0.01, maxtransitduration=0.4, ingressdurationfraction=0.1, verbose=True)[source]¶ This calculates the SNR, depth, duration, a refit period, and time of centertransit for a single period.
The equation used for SNR is:
SNR = (transit model depth / RMS of LC with transit model subtracted) * sqrt(number of points in transit)
NOTE: you should set the kwargs sigclip, nphasebins, mintransitduration, maxtransitduration to what you used for an initial BLS run to detect transits in the input light curve to match those input conditions.
Parameters:  times,mags,errs (np.array) – These contain the magnitude/flux timeseries and any associated errors.
 period (float) – The period to search around and refit the transits. This will be used to calculate the start and end periods of a rerun of BLS to calculate the stats.
 magsarefluxes (bool) – Set to True if the input measurements in mags are actually fluxes and not magnitudes.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 perioddeltapercent (float) –
The fraction of the period provided to use to search around this value. This is a percentage. The period range searched will then be:
[period  (perioddeltapercent/100.0)*period, period + (perioddeltapercent/100.0)*period]
 nphasebins (int) – The number of phase bins to use in the BLS run.
 mintransitduration (float) – The minimum transit duration in phase to consider.
 maxtransitduration (float) – The maximum transit duration to consider.
 ingressdurationfraction (float) – The fraction of the transit duration to use to generate an initial value of the transit ingress duration for the BLS model refit. This will be fit by this function.
 verbose (bool) – If True, will indicate progress and any problems encountered.
Returns: A dict of the following form is returned:
{'period': the refit best period, 'epoch': the refit epoch (i.e. midtransit time), 'snr':the SNR of the transit, 'transitdepth':the depth of the transit, 'transitduration':the duration of the transit, 'ingressduration':if trapezoid fit OK, is the ingress duration, 'npoints_in_transit':the number of LC points in transit, 'fit_status': 'ok' or 'trapezoid model fit failed,...', 'nphasebins':the input value of nphasebins, 'transingressbin':the phase bin containing transit ingress, 'transegressbin':the phase bin containing transit egress, 'blsmodel':the full BLS model used along with its parameters, 'subtractedmags':BLS model  phased light curve, 'phasedmags':the phase light curve, 'phases': the phase values}
You should check the ‘fit_status’ key in this returned dict for a value of ‘ok’. If it is ‘trapezoid model fit failed, using box model’, you may not want to trust the transit period and epoch found.
Return type: dict

astrobase.periodbase.kbls.
bls_snr
(blsdict, times, mags, errs, assumeserialbls=False, magsarefluxes=False, sigclip=10.0, npeaks=None, perioddeltapercent=10, ingressdurationfraction=0.1, verbose=True)[source]¶ Calculates the signal to noise ratio for each best peak in the BLS periodogram, along with transit depth, duration, and refit period and epoch.
The following equation is used for SNR:
SNR = (transit model depth / RMS of LC with transit model subtracted) * sqrt(number of points in transit)
Parameters:  blsdict (dict) –
This is an lspinfo dict produced by either bls_parallel_pfind or bls_serial_pfind in this module, or by your own BLS function. If you provide results in a dict from an external BLS function, make sure this matches the form below:
{'bestperiod': the best period value in the periodogram, 'bestlspval': the periodogram peak associated with the best period, 'nbestpeaks': the input value of nbestpeaks, 'nbestlspvals': nbestpeakssize list of best period peak values, 'nbestperiods': nbestpeakssize list of best periods, 'lspvals': the full array of periodogram powers, 'frequencies': the full array of frequencies considered, 'periods': the full array of periods considered, 'blsresult': list of result dicts from eebls.f wrapper functions, 'stepsize': the actual stepsize used, 'nfreq': the actual nfreq used, 'nphasebins': the actual nphasebins used, 'mintransitduration': the input mintransitduration, 'maxtransitduration': the input maxtransitdurations, 'method':'bls' > the name of the periodfinder method, 'kwargs':{ dict of all of the input kwargs for recordkeeping}}
 times,mags,errs (np.array) – These contain the magnitude/flux timeseries and any associated errors.
 assumeserialbls (bool) – If this is True, this function will not rerun BLS around each best peak in the input lspinfo dict to refit the periods and epochs. This is usally required for bls_parallel_pfind so set this to False if you use results from that function. The parallel method breaks up the frequency space into chunks for speed, and the results may not exactly match those from a regular BLS run.
 magsarefluxes (bool) – Set to True if the input measurements in mags are actually fluxes and not magnitudes.
 npeaks (int or None) – This controls how many of the periods in blsdict[‘nbestperiods’] to find the SNR for. If it’s None, then this will calculate the SNR for all of them. If it’s an integer between 1 and len(blsdict[‘nbestperiods’]), will calculate for only the specified number of peak periods, starting from the best period.
 perioddeltapercent (float) –
The fraction of the period provided to use to search around this value. This is a percentage. The period range searched will then be:
[period  (perioddeltapercent/100.0)*period, period + (perioddeltapercent/100.0)*period]
 ingressdurationfraction (float) – The fraction of the transit duration to use to generate an initial value of the transit ingress duration for the BLS model refit. This will be fit by this function.
 verbose (bool) – If True, will indicate progress and any problems encountered.
Returns: A dict of the following form is returned:
{'npeaks: the number of periodogram peaks requested to get SNR for, 'period': list of refit best periods for each requested peak, 'epoch': list of refit epochs (i.e. midtransit times), 'snr':list of SNRs of the transit for each requested peak, 'transitdepth':list of depths of the transits, 'transitduration':list of durations of the transits, 'nphasebins':the input value of nphasebins, 'transingressbin':the phase bin containing transit ingress, 'transegressbin':the phase bin containing transit egress, 'allblsmodels':the full BLS models used along with its parameters, 'allsubtractedmags':BLS models  phased light curves, 'allphasedmags':the phase light curves, 'allphases': the phase values}
Return type: dict
 blsdict (dict) –
astrobase.periodbase.htls module¶
Contains the Hippke & Heller (2019) transitleastsquared periodsearch algorithm implementation for periodbase. This depends on the external package written by Hippke & Heller, https://github.com/hippke/tls.

astrobase.periodbase.htls.
tls_parallel_pfind
(times, mags, errs, magsarefluxes=None, startp=0.1, endp=None, tls_oversample=5, tls_mintransits=3, tls_transit_template='default', tls_rstar_min=0.13, tls_rstar_max=3.5, tls_mstar_min=0.1, tls_mstar_max=2.0, periodepsilon=0.1, nbestpeaks=5, sigclip=10.0, verbose=True, nworkers=None)[source]¶ Wrapper to Hippke & Heller (2019)’s “transit least squares”, which is BLS, but with a slightly better template (and niceties in the implementation).
A few comments:
The time series must be in units of days.
The frequency sampling Hippke & Heller (2019) advocate for is cubic in frequencies, instead of linear. Ofir (2014) found that the linearinfrequency sampling (which is correct for sinusoidal signal detection) isn’t optimal for a Keplerian box signal. He gave an equation for “optimal” sampling. tlsoversample is the factor by which to oversample over that. The grid can be imported independently via:
from transitleastsquares import period_grid
The spacing equations are given here: https://transitleastsquares.readthedocs.io/en/latest/Python%20interface.html#periodgrid
The boundaries of the period search are by default 0.1 day to 99% the baseline of times.
Parameters:  times,mags,errs (np.array) – The magnitude/flux timeseries to search for transits.
 magsarefluxes (bool) – transitleastsquares requires fluxes. Therefore if magsarefluxes is set to false, the passed mags are converted to fluxes. All output dictionary vectors include fluxes, not mags.
 startp,endp (float) – The minimum and maximum periods to consider for the transit search.
 tls_oversample (int) – Factor by which to oversample the frequency grid.
 tls_mintransits (int) – Sets the min_n_transits kwarg for the BoxLeastSquares.autoperiod() function.
 tls_transit_template (str) – default, grazing, or box.
 tls_rstar_min,tls_rstar_max (float) – The range of stellar radii to consider when generating a frequency grid. In uniits of Rsun.
 tls_mstar_min,tls_mstar_max (float) – The range of stellar masses to consider when generating a frequency grid. In units of Msun.
 periodepsilon (float) – The fractional difference between successive values of ‘best’ periods when sorting by periodogram power to consider them as separate periods (as opposed to part of the same periodogram peak). This is used to avoid broad peaks in the periodogram and make sure the ‘best’ periods returned are all actually independent.
 nbestpeaks (int) – The number of ‘best’ peaks to return from the periodogram results, starting from the global maximum of the periodogram peak values.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 verbose (bool) – Kept for consistency with periodbase functions.
 nworkers (int or None) – The number of parallel workers to launch for periodsearch. If None, nworkers = NCPUS.
Returns: This function returns a dict, referred to as an lspinfo dict in other astrobase functions that operate on periodogram results. The format is similar to the other astrobase periodfinders – it contains the nbestpeaks, which is the most important thing. (But isn’t entirely standardized.)
Crucially, it also contains “tlsresult”, which is a dictionary with transitleastsquares spectra (used to get the SDE as defined in the TLS paper), statistics, transit period, midtime, duration, depth, SNR, and the “odd_even_mismatch” statistic. The full key list is:
dict_keys(['SDE', 'SDE_raw', 'chi2_min', 'chi2red_min', 'period', 'period_uncertainty', 'T0', 'duration', 'depth', 'depth_mean', 'depth_mean_even', 'depth_mean_odd', 'transit_depths', 'transit_depths_uncertainties', 'rp_rs', 'snr', 'snr_per_transit', 'snr_pink_per_transit', 'odd_even_mismatch', 'transit_times', 'per_transit_count', 'transit_count', 'distinct_transit_count', 'empty_transit_count', 'FAP', 'in_transit_count', 'after_transit_count', 'before_transit_count', 'periods', 'power', 'power_raw', 'SR', 'chi2', 'chi2red', 'model_lightcurve_time', 'model_lightcurve_model', 'model_folded_phase', 'folded_y', 'folded_dy', 'folded_phase', 'model_folded_model'])
The descriptions are here:
https://transitleastsquares.readthedocs.io/en/latest/Python%20interface.html#returnvalues
The remaining resultdict is:
resultdict = { 'tlsresult':tlsresult, 'bestperiod': the best period value in the periodogram, 'bestlspval': the peak associated with the best period, 'nbestpeaks': the input value of nbestpeaks, 'nbestlspvals': nbestpeakssize list of best period peak values, 'nbestperiods': nbestpeakssize list of best periods, 'lspvals': the full array of periodogram powers, 'periods': the full array of periods considered, 'tlsresult': Astropy tls result object (BoxLeastSquaresResult), 'tlsmodel': Astropy tls BoxLeastSquares object used for work, 'method':'tls' > the name of the periodfinder method, 'kwargs':{ dict of all of the input kwargs for recordkeeping} }
Return type: dict
astrobase.periodbase.spdm module¶
Contains the Stellingwerf (1978) phasedispersion minimization periodsearch algorithm implementation for periodbase.

astrobase.periodbase.spdm.
stellingwerf_pdm_theta
(times, mags, errs, frequency, binsize=0.05, minbin=9)[source]¶ This calculates the Stellingwerf PDM theta value at a test frequency.
Parameters:  times,mags,errs (np.array) – The input timeseries and associated errors.
 frequency (float) – The test frequency to calculate the theta statistic at.
 binsize (float) – The phase bin size to use.
 minbin (int) – The minimum number of items in a phase bin to consider in the calculation of the statistic.
Returns: theta_pdm – The value of the theta statistic at the specified frequency.
Return type: float

astrobase.periodbase.spdm.
stellingwerf_pdm
(times, mags, errs, magsarefluxes=False, startp=None, endp=None, stepsize=0.0001, autofreq=True, normalize=False, phasebinsize=0.05, mindetperbin=9, nbestpeaks=5, periodepsilon=0.1, sigclip=10.0, nworkers=None, verbose=True)[source]¶ This runs a parallelized Stellingwerf phasedispersion minimization (PDM) period search.
Parameters:  times,mags,errs (np.array) – The mag/flux timeseries with associated measurement errors to run the periodfinding on.
 magsarefluxes (bool) – If the input measurement values in mags and errs are in fluxes, set this to True.
 startp,endp (float or None) – The minimum and maximum periods to consider for the transit search.
 stepsize (float) – The stepsize in frequency to use when constructing a frequency grid for the period search.
 autofreq (bool) – If this is True, the value of stepsize will be ignored and the
astrobase.periodbase.get_frequency_grid()
function will be used to generate a frequency grid based on startp, and endp. If these are None as well, startp will be set to 0.1 and endp will be set to times.max()  times.min().  normalize (bool) – This sets if the input timeseries is normalized to 0.0 and rescaled such that its variance = 1.0. This is the recommended procedure by SchwarzenbergCzerny 1996.
 phasebinsize (float) – The bin size in phase to use when calculating the PDM theta statistic at a test frequency.
 mindetperbin (int) – The minimum number of elements in a phase bin to consider it valid when calculating the PDM theta statistic at a test frequency.
 nbestpeaks (int) – The number of ‘best’ peaks to return from the periodogram results, starting from the global maximum of the periodogram peak values.
 periodepsilon (float) – The fractional difference between successive values of ‘best’ periods when sorting by periodogram power to consider them as separate periods (as opposed to part of the same periodogram peak). This is used to avoid broad peaks in the periodogram and make sure the ‘best’ periods returned are all actually independent.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 nworkers (int) – The number of parallel workers to use when calculating the periodogram.
 verbose (bool) – If this is True, will indicate progress and details about the frequency grid used for the period search.
Returns: This function returns a dict, referred to as an lspinfo dict in other astrobase functions that operate on periodogram results. This is a standardized format across all astrobase periodfinders, and is of the form below:
{'bestperiod': the best period value in the periodogram, 'bestlspval': the periodogram peak associated with the best period, 'nbestpeaks': the input value of nbestpeaks, 'nbestlspvals': nbestpeakssize list of best period peak values, 'nbestperiods': nbestpeakssize list of best periods, 'lspvals': the full array of periodogram powers, 'periods': the full array of periods considered, 'method':'pdm' > the name of the periodfinder method, 'kwargs':{ dict of all of the input kwargs for recordkeeping}}
Return type: dict

astrobase.periodbase.spdm.
analytic_false_alarm_probability
(lspinfo, times, conservative_nfreq_eff=True, peakvals=None, inplace=True)[source]¶ This returns the analytic false alarm probabilities for periodogram peak values.
FIXME: this doesn’t actually work. Fix later.
The calculation follows that on page 3 of Zechmeister & Kurster (2009):
FAP = 1 − [1 − Prob(z > z0)]**M
where:
M is the number of independent frequencies Prob(z > z0) is the probability of peak with value > z0 z0 is the peak value we're evaluating
For PDM, the Prob(z > z0) is described by the beta distribution, according to:
 SchwarzenbergCzerny (1997; https://ui.adsabs.harvard.edu/#abs/1997ApJ…489..941S)
 Zalian, Chadid, and Stellingwerf (2013; http://adsabs.harvard.edu/abs/2014MNRAS.440…68Z)
This is given by:
beta( (NB)/2, (B1)/2; ((NB)/(B1))*theta_pdm )
Where:
N = number of observations B = number of phase bins
This translates to a scipy.stats call to the beta distribution CDF:
x = ((NB)/(B1))*theta_pdm_best prob_exceeds_val = scipy.stats.beta.cdf(x, (NB)/2.0, (B1.0)/2.0)
Which we can then plug into the false alarm prob eqn above with the calculation of M.
Parameters:  lspinfo (dict) – The dict returned by the
stellingwerf_pdm()
function.  times (np.array) – The times for which the periodogram result in
lspinfo
was calculated.  conservative_nfreq_eff (bool) –
If True, will follow the prescription given in SchwarzenbergCzerny (2003):
http://adsabs.harvard.edu/abs/2003ASPC..292..383S
and estimate the effective number of independent frequences M_eff as:
min(N_obs, N_freq, DELTA_f/delta_f)
 peakvals (sequence or None) – The peak values for which to evaluate the falsealarm probability. If
None, will calculate this for each of the peak values in the
nbestpeaks
key of thelspinfo
dict.  inplace (bool) – If True, puts the results of the FAP calculation into the
lspinfo
dict as a list available aslspinfo['falsealarmprob']
.
Returns: The calculated false alarm probabilities for each of the peak values in
peakvals
.Return type: list
astrobase.periodbase.saov module¶
Contains the SchwarzenbergCzerny Analysis of Variance periodsearch algorithm implementation for periodbase.

astrobase.periodbase.saov.
aov_theta
(times, mags, errs, frequency, binsize=0.05, minbin=9)[source]¶ Calculates the SchwarzenbergCzerny AoV statistic at a test frequency.
Parameters:  times,mags,errs (np.array) – The input timeseries and associated errors.
 frequency (float) – The test frequency to calculate the theta statistic at.
 binsize (float) – The phase bin size to use.
 minbin (int) – The minimum number of items in a phase bin to consider in the calculation of the statistic.
Returns: theta_aov – The value of the AoV statistic at the specified frequency.
Return type: float

astrobase.periodbase.saov.
aov_periodfind
(times, mags, errs, magsarefluxes=False, startp=None, endp=None, stepsize=0.0001, autofreq=True, normalize=True, phasebinsize=0.05, mindetperbin=9, nbestpeaks=5, periodepsilon=0.1, sigclip=10.0, nworkers=None, verbose=True)[source]¶ This runs a parallelized AnalysisofVariance (AoV) period search.
NOTE: normalize = True here as recommended by SchwarzenbergCzerny 1996, i.e. mags will be normalized to zero and rescaled so their variance = 1.0.
Parameters:  times,mags,errs (np.array) – The mag/flux timeseries with associated measurement errors to run the periodfinding on.
 magsarefluxes (bool) – If the input measurement values in mags and errs are in fluxes, set this to True.
 startp,endp (float or None) – The minimum and maximum periods to consider for the transit search.
 stepsize (float) – The stepsize in frequency to use when constructing a frequency grid for the period search.
 autofreq (bool) – If this is True, the value of stepsize will be ignored and the
astrobase.periodbase.get_frequency_grid()
function will be used to generate a frequency grid based on startp, and endp. If these are None as well, startp will be set to 0.1 and endp will be set to times.max()  times.min().  normalize (bool) – This sets if the input timeseries is normalized to 0.0 and rescaled such that its variance = 1.0. This is the recommended procedure by SchwarzenbergCzerny 1996.
 phasebinsize (float) – The bin size in phase to use when calculating the AoV theta statistic at a test frequency.
 mindetperbin (int) – The minimum number of elements in a phase bin to consider it valid when calculating the AoV theta statistic at a test frequency.
 nbestpeaks (int) – The number of ‘best’ peaks to return from the periodogram results, starting from the global maximum of the periodogram peak values.
 periodepsilon (float) – The fractional difference between successive values of ‘best’ periods when sorting by periodogram power to consider them as separate periods (as opposed to part of the same periodogram peak). This is used to avoid broad peaks in the periodogram and make sure the ‘best’ periods returned are all actually independent.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 nworkers (int) – The number of parallel workers to use when calculating the periodogram.
 verbose (bool) – If this is True, will indicate progress and details about the frequency grid used for the period search.
Returns: This function returns a dict, referred to as an lspinfo dict in other astrobase functions that operate on periodogram results. This is a standardized format across all astrobase periodfinders, and is of the form below:
{'bestperiod': the best period value in the periodogram, 'bestlspval': the periodogram peak associated with the best period, 'nbestpeaks': the input value of nbestpeaks, 'nbestlspvals': nbestpeakssize list of best period peak values, 'nbestperiods': nbestpeakssize list of best periods, 'lspvals': the full array of periodogram powers, 'periods': the full array of periods considered, 'method':'aov' > the name of the periodfinder method, 'kwargs':{ dict of all of the input kwargs for recordkeeping}}
Return type: dict

astrobase.periodbase.saov.
analytic_false_alarm_probability
(lspinfo, times, conservative_nfreq_eff=True, peakvals=None, inplace=True)[source]¶ This returns the analytic false alarm probabilities for periodogram peak values.
FIXME: this doesn’t actually work. Fix later.
The calculation follows that on page 3 of Zechmeister & Kurster (2009):
FAP = 1 − [1 − Prob(z > z0)]**M
where:
M is the number of independent frequencies Prob(z > z0) is the probability of peak with value > z0 z0 is the peak value we're evaluating
For AoV and AoVharmonic, the Prob(z > z0) is described by the F distribution, according to:
 SchwarzenbergCzerny (1997; https://ui.adsabs.harvard.edu/#abs/1997ApJ…489..941S)
This is given by:
F( (B1), (NB); theta_aov )
Where:
N = number of observations B = number of phase bins
This translates to a scipy.stats call to the F distribution CDF:
x = theta_aov_best prob_exceeds_val = scipy.stats.f.cdf(x, (B1.0), (NB))
Which we can then plug into the false alarm prob eqn above with the calculation of M.
Parameters:  lspinfo (dict) – The dict returned by the
aov_periodfind()
function.  times (np.array) – The times for which the periodogram result in
lspinfo
was calculated.  conservative_nfreq_eff (bool) –
If True, will follow the prescription given in SchwarzenbergCzerny (2003):
http://adsabs.harvard.edu/abs/2003ASPC..292..383S
and estimate the effective number of independent frequences M_eff as:
min(N_obs, N_freq, DELTA_f/delta_f)
 peakvals (sequence or None) – The peak values for which to evaluate the falsealarm probability. If
None, will calculate this for each of the peak values in the
nbestpeaks
key of thelspinfo
dict.  inplace (bool) – If True, puts the results of the FAP calculation into the
lspinfo
dict as a list available aslspinfo['falsealarmprob']
.
Returns: The calculated false alarm probabilities for each of the peak values in
peakvals
.Return type: list
astrobase.periodbase.smav module¶
Contains the SchwarzenbergCzerny Analysis of Variance periodsearch algorithm implementation for periodbase. This uses the multiharmonic version presented in SchwarzenbergCzerny (1996).

astrobase.periodbase.smav.
aovhm_theta
(times, mags, errs, frequency, nharmonics, magvariance)[source]¶ This calculates the harmonic AoV theta statistic for a frequency.
This is a mostly faithful translation of the inner loop in aovper.f90. See the following for details:
 http://users.camk.edu.pl/alex/
 SchwarzenbergCzerny (1996)
SchwarzenbergCzerny (1996) equation 11:
theta_prefactor = (K  2N  1)/(2N) theta_top = sum(c_n*c_n) (from n=0 to n=2N) theta_bot = variance(timeseries)  sum(c_n*c_n) (from n=0 to n=2N) theta = theta_prefactor * (theta_top/theta_bot) N = number of harmonics (nharmonics) K = length of time series (times.size)
Parameters:  times,mags,errs (np.array) – The input timeseries to calculate the test statistic for. These should all be of nans/infs and be normalized to zero.
 frequency (float) – The test frequency to calculate the statistic for.
 nharmonics (int) – The number of harmonics to calculate up to.The recommended range is 4 to 8.
 magvariance (float) – This is the (weighted by errors) variance of the magnitude time series. We provide it as a precalculated value here so we don’t have to recalculate it for every worker.
Returns: aov_harmonic_theta – THe value of the harmonic AoV theta for the specified test frequency.
Return type: float

astrobase.periodbase.smav.
aovhm_periodfind
(times, mags, errs, magsarefluxes=False, startp=None, endp=None, stepsize=0.0001, autofreq=True, normalize=True, nharmonics=6, nbestpeaks=5, periodepsilon=0.1, sigclip=10.0, nworkers=None, verbose=True)[source]¶ This runs a parallelized harmonic AnalysisofVariance (AoV) period search.
NOTE: normalize = True here as recommended by SchwarzenbergCzerny 1996, i.e. mags will be normalized to zero and rescaled so their variance = 1.0.
Parameters:  times,mags,errs (np.array) – The mag/flux timeseries with associated measurement errors to run the periodfinding on.
 magsarefluxes (bool) – If the input measurement values in mags and errs are in fluxes, set this to True.
 startp,endp (float or None) – The minimum and maximum periods to consider for the transit search.
 stepsize (float) – The stepsize in frequency to use when constructing a frequency grid for the period search.
 autofreq (bool) – If this is True, the value of stepsize will be ignored and the
astrobase.periodbase.get_frequency_grid()
function will be used to generate a frequency grid based on startp, and endp. If these are None as well, startp will be set to 0.1 and endp will be set to times.max()  times.min().  normalize (bool) – This sets if the input timeseries is normalized to 0.0 and rescaled such that its variance = 1.0. This is the recommended procedure by SchwarzenbergCzerny 1996.
 nharmonics (int) – The number of harmonics to use when calculating the AoV theta value at a test frequency. This should be between 4 and 8 in most cases.
 nbestpeaks (int) – The number of ‘best’ peaks to return from the periodogram results, starting from the global maximum of the periodogram peak values.
 periodepsilon (float) – The fractional difference between successive values of ‘best’ periods when sorting by periodogram power to consider them as separate periods (as opposed to part of the same periodogram peak). This is used to avoid broad peaks in the periodogram and make sure the ‘best’ periods returned are all actually independent.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 nworkers (int) – The number of parallel workers to use when calculating the periodogram.
 verbose (bool) – If this is True, will indicate progress and details about the frequency grid used for the period search.
Returns: This function returns a dict, referred to as an lspinfo dict in other astrobase functions that operate on periodogram results. This is a standardized format across all astrobase periodfinders, and is of the form below:
{'bestperiod': the best period value in the periodogram, 'bestlspval': the periodogram peak associated with the best period, 'nbestpeaks': the input value of nbestpeaks, 'nbestlspvals': nbestpeakssize list of best period peak values, 'nbestperiods': nbestpeakssize list of best periods, 'lspvals': the full array of periodogram powers, 'periods': the full array of periods considered, 'method':'mav' > the name of the periodfinder method, 'kwargs':{ dict of all of the input kwargs for recordkeeping}}
Return type: dict

astrobase.periodbase.smav.
analytic_false_alarm_probability
(lspinfo, times, conservative_nfreq_eff=True, peakvals=None, inplace=True)[source]¶ This returns the analytic false alarm probabilities for periodogram peak values.
FIXME: this doesn’t actually work. Fix later.
The calculation follows that on page 3 of Zechmeister & Kurster (2009):
FAP = 1 − [1 − Prob(z > z0)]**M
where:
M is the number of independent frequencies Prob(z > z0) is the probability of peak with value > z0 z0 is the peak value we're evaluating
For AoV and AoVharmonic, the Prob(z > z0) is described by the F distribution, according to:
 SchwarzenbergCzerny (1997; https://ui.adsabs.harvard.edu/#abs/1997ApJ…489..941S)
 SchwarzenbergCzerny (1996; http://adsabs.harvard.edu/abs/1996ApJ…460L.107S)
This is given by:
F( 2N, K  2N  1; theta_aov )
Where:
N = number of harmonics used for AOV_harmonic K = number of observations
This translates to a scipy.stats call to the F distribution CDF:
x = theta_aov_best prob_exceeds_val = scipy.stats.f.cdf(x, 2N, K  2N  1)
Which we can then plug into the false alarm prob eqn above with the calculation of M.
Parameters:  lspinfo (dict) – The dict returned by the
aovhm_periodfind()
function.  times (np.array) – The times for which the periodogram result in
lspinfo
was calculated.  conservative_nfreq_eff (bool) –
If True, will follow the prescription given in SchwarzenbergCzerny (2003):
http://adsabs.harvard.edu/abs/2003ASPC..292..383S
and estimate the effective number of independent frequences M_eff as:
min(N_obs, N_freq, DELTA_f/delta_f)
 peakvals (sequence or None) – The peak values for which to evaluate the falsealarm probability. If
None, will calculate this for each of the peak values in the
nbestpeaks
key of thelspinfo
dict.  inplace (bool) – If True, puts the results of the FAP calculation into the
lspinfo
dict as a list available aslspinfo['falsealarmprob']
.
Returns: The calculated false alarm probabilities for each of the peak values in
peakvals
.Return type: list
astrobase.periodbase.zgls module¶
Contains the Zechmeister & Kurster (2002) Generalized LombScargle periodsearch algorithm implementation for periodbase.

astrobase.periodbase.zgls.
generalized_lsp_value
(times, mags, errs, omega)[source]¶ Generalized LSP value for a single omega.
The relations used are:
P(w) = (1/YY) * (YC*YC/CC + YS*YS/SS) where: YC, YS, CC, and SS are all calculated at T and where: tan 2omegaT = 2*CS/(CC  SS) and where: Y = sum( w_i*y_i ) C = sum( w_i*cos(wT_i) ) S = sum( w_i*sin(wT_i) ) YY = sum( w_i*y_i*y_i )  Y*Y YC = sum( w_i*y_i*cos(wT_i) )  Y*C YS = sum( w_i*y_i*sin(wT_i) )  Y*S CpC = sum( w_i*cos(w_T_i)*cos(w_T_i) ) CC = CpC  C*C SS = (1  CpC)  S*S CS = sum( w_i*cos(w_T_i)*sin(w_T_i) )  C*S
Parameters:  times,mags,errs (np.array) – The timeseries to calculate the periodogram value for.
 omega (float) – The frequency to calculate the periodogram value at.
Returns: periodogramvalue – The normalized periodogram at the specified test frequency omega.
Return type: float

astrobase.periodbase.zgls.
generalized_lsp_value_withtau
(times, mags, errs, omega)[source]¶ Generalized LSP value for a single omega.
This uses tau to provide an arbitrary timereference point.
The relations used are:
P(w) = (1/YY) * (YC*YC/CC + YS*YS/SS) where: YC, YS, CC, and SS are all calculated at T and where: tan 2omegaT = 2*CS/(CC  SS) and where: Y = sum( w_i*y_i ) C = sum( w_i*cos(wT_i) ) S = sum( w_i*sin(wT_i) ) YY = sum( w_i*y_i*y_i )  Y*Y YC = sum( w_i*y_i*cos(wT_i) )  Y*C YS = sum( w_i*y_i*sin(wT_i) )  Y*S CpC = sum( w_i*cos(w_T_i)*cos(w_T_i) ) CC = CpC  C*C SS = (1  CpC)  S*S CS = sum( w_i*cos(w_T_i)*sin(w_T_i) )  C*S
Parameters:  times,mags,errs (np.array) – The timeseries to calculate the periodogram value for.
 omega (float) – The frequency to calculate the periodogram value at.
Returns: periodogramvalue – The normalized periodogram at the specified test frequency omega.
Return type: float

astrobase.periodbase.zgls.
generalized_lsp_value_notau
(times, mags, errs, omega)[source]¶ This is the simplified version not using tau.
The relations used are:
W = sum (1.0/(errs*errs) ) w_i = (1/W)*(1/(errs*errs)) Y = sum( w_i*y_i ) C = sum( w_i*cos(wt_i) ) S = sum( w_i*sin(wt_i) ) YY = sum( w_i*y_i*y_i )  Y*Y YC = sum( w_i*y_i*cos(wt_i) )  Y*C YS = sum( w_i*y_i*sin(wt_i) )  Y*S CpC = sum( w_i*cos(w_t_i)*cos(w_t_i) ) CC = CpC  C*C SS = (1  CpC)  S*S CS = sum( w_i*cos(w_t_i)*sin(w_t_i) )  C*S D(omega) = CC*SS  CS*CS P(omega) = (SS*YC*YC + CC*YS*YS  2.0*CS*YC*YS)/(YY*D)
Parameters:  times,mags,errs (np.array) – The timeseries to calculate the periodogram value for.
 omega (float) – The frequency to calculate the periodogram value at.
Returns: periodogramvalue – The normalized periodogram at the specified test frequency omega.
Return type: float

astrobase.periodbase.zgls.
specwindow_lsp_value
(times, mags, errs, omega)[source]¶ This calculates the peak associated with the spectral window function for times and at the specified omega.
NOTE: this is classical LombScargle, not the Generalized LombScargle. mags and errs are silently ignored since we’re calculating the periodogram of the observing window function. These are kept to present a consistent external API so the pgen_lsp function below can call this transparently.
Parameters:  times,mags,errs (np.array) – The timeseries to calculate the periodogram value for.
 omega (float) – The frequency to calculate the periodogram value at.
Returns: periodogramvalue – The normalized periodogram at the specified test frequency omega.
Return type: float

astrobase.periodbase.zgls.
pgen_lsp
(times, mags, errs, magsarefluxes=False, startp=None, endp=None, stepsize=0.0001, autofreq=True, nbestpeaks=5, periodepsilon=0.1, sigclip=10.0, nworkers=None, workchunksize=None, glspfunc=<function _glsp_worker_withtau>, verbose=True)[source]¶ This calculates the generalized LombScargle periodogram.
Uses the algorithm from Zechmeister and Kurster (2009).
Parameters:  times,mags,errs (np.array) – The mag/flux timeseries with associated measurement errors to run the periodfinding on.
 magsarefluxes (bool) – If the input measurement values in mags and errs are in fluxes, set this to True.
 startp,endp (float or None) – The minimum and maximum periods to consider for the transit search.
 stepsize (float) – The stepsize in frequency to use when constructing a frequency grid for the period search.
 autofreq (bool) – If this is True, the value of stepsize will be ignored and the
astrobase.periodbase.get_frequency_grid()
function will be used to generate a frequency grid based on startp, and endp. If these are None as well, startp will be set to 0.1 and endp will be set to times.max()  times.min().  nbestpeaks (int) – The number of ‘best’ peaks to return from the periodogram results, starting from the global maximum of the periodogram peak values.
 periodepsilon (float) – The fractional difference between successive values of ‘best’ periods when sorting by periodogram power to consider them as separate periods (as opposed to part of the same periodogram peak). This is used to avoid broad peaks in the periodogram and make sure the ‘best’ periods returned are all actually independent.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 nworkers (int) – The number of parallel workers to use when calculating the periodogram.
 workchunksize (None or int) – If this is an int, will use chunks of the given size to break up the work for the parallel workers. If None, the chunk size is set to 1.
 glspfunc (Python function) – The worker function to use to calculate the periodogram. This can be used to make this function calculate the timeseries sampling window function instead of the timeseries measurements’ GLS periodogram by passing in _glsp_worker_specwindow instead of the default _glsp_worker_withtau function.
 verbose (bool) – If this is True, will indicate progress and details about the frequency grid used for the period search.
Returns: This function returns a dict, referred to as an lspinfo dict in other astrobase functions that operate on periodogram results. This is a standardized format across all astrobase periodfinders, and is of the form below:
{'bestperiod': the best period value in the periodogram, 'bestlspval': the periodogram peak associated with the best period, 'nbestpeaks': the input value of nbestpeaks, 'nbestlspvals': nbestpeakssize list of best period peak values, 'nbestperiods': nbestpeakssize list of best periods, 'lspvals': the full array of periodogram powers, 'periods': the full array of periods considered, 'method':'gls' > the name of the periodfinder method, 'kwargs':{ dict of all of the input kwargs for recordkeeping}}
Return type: dict

astrobase.periodbase.zgls.
specwindow_lsp
(times, mags, errs, magsarefluxes=False, startp=None, endp=None, stepsize=0.0001, autofreq=True, nbestpeaks=5, periodepsilon=0.1, sigclip=10.0, nworkers=None, glspfunc=<function _glsp_worker_specwindow>, verbose=True)[source]¶ This calculates the spectral window function.
Wraps the pgen_lsp function above to use the specific worker for calculating the windowfunction.
Parameters:  times,mags,errs (np.array) – The mag/flux timeseries with associated measurement errors to run the periodfinding on.
 magsarefluxes (bool) – If the input measurement values in mags and errs are in fluxes, set this to True.
 startp,endp (float or None) – The minimum and maximum periods to consider for the transit search.
 stepsize (float) – The stepsize in frequency to use when constructing a frequency grid for the period search.
 autofreq (bool) – If this is True, the value of stepsize will be ignored and the
astrobase.periodbase.get_frequency_grid()
function will be used to generate a frequency grid based on startp, and endp. If these are None as well, startp will be set to 0.1 and endp will be set to times.max()  times.min().  nbestpeaks (int) – The number of ‘best’ peaks to return from the periodogram results, starting from the global maximum of the periodogram peak values.
 periodepsilon (float) – The fractional difference between successive values of ‘best’ periods when sorting by periodogram power to consider them as separate periods (as opposed to part of the same periodogram peak). This is used to avoid broad peaks in the periodogram and make sure the ‘best’ periods returned are all actually independent.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 nworkers (int) – The number of parallel workers to use when calculating the periodogram.
 glspfunc (Python function) – The worker function to use to calculate the periodogram. This is used to used to make the pgen_lsp function calculate the timeseries sampling window function instead of the timeseries measurements’ GLS periodogram by passing in _glsp_worker_specwindow instead of the default _glsp_worker function.
 verbose (bool) – If this is True, will indicate progress and details about the frequency grid used for the period search.
Returns: This function returns a dict, referred to as an lspinfo dict in other astrobase functions that operate on periodogram results. This is a standardized format across all astrobase periodfinders, and is of the form below:
{'bestperiod': the best period value in the periodogram, 'bestlspval': the periodogram peak associated with the best period, 'nbestpeaks': the input value of nbestpeaks, 'nbestlspvals': nbestpeakssize list of best period peak values, 'nbestperiods': nbestpeakssize list of best periods, 'lspvals': the full array of periodogram powers, 'periods': the full array of periods considered, 'method':'win' > the name of the periodfinder method, 'kwargs':{ dict of all of the input kwargs for recordkeeping}}
Return type: dict

astrobase.periodbase.zgls.
probability_peak_exceeds_value
(times, peakval)[source]¶ This calculates the probability that periodogram values exceed the given peak value.
This is from page 3 of Zechmeister and Kurster (2009):
Prob(p > p_best) = (1 − p_best)**((N−3)/2)
where:
p_best is the peak value in consideration N is the number of times
Note that this is for the default normalization of the periodogram, e.g. P_normalized = P(omega), such that P represents the sample variance (see Table 1).
Parameters:  lspvals (np.array) – The periodogram power value array.
 peakval (float) – A single peak value to calculate the probability for.
Returns: prob – The probability value.
Return type: float

astrobase.periodbase.zgls.
analytic_false_alarm_probability
(lspinfo, times, conservative_nfreq_eff=True, peakvals=None, inplace=True)[source]¶ This returns the analytic false alarm probabilities for periodogram peak values.
The calculation follows that on page 3 of Zechmeister & Kurster (2009):
FAP = 1 − [1 − Prob(z > z0)]**M
where:
M is the number of independent frequencies Prob(z > z0) is the probability of peak with value > z0 z0 is the peak value we're evaluating
Parameters:  lspinfo (dict) – The dict returned by the
pgen_lsp()
function.  times (np.array) – The times for which the periodogram result in
lspinfo
was calculated.  conservative_nfreq_eff (bool) –
If True, will follow the prescription given in SchwarzenbergCzerny (2003):
http://adsabs.harvard.edu/abs/2003ASPC..292..383S
and estimate the effective number of independent frequences M_eff as:
min(N_obs, N_freq, DELTA_f/delta_f)
 peakvals (sequence or None) – The peak values for which to evaluate the falsealarm probability. If
None, will calculate this for each of the peak values in the
nbestpeaks
key of thelspinfo
dict.  inplace (bool) – If True, puts the results of the FAP calculation into the
lspinfo
dict as a list available aslspinfo['falsealarmprob']
.
Returns: The calculated false alarm probabilities for each of the peak values in
peakvals
.Return type: list
 lspinfo (dict) – The dict returned by the
astrobase.periodbase.macf module¶
This contains the ACF periodfinding algorithm from McQuillan+ 2013a and McQuillan+ 2014.

astrobase.periodbase.macf.
plot_acf_results
(acfp, outfile, maxlags=5000, yrange=(0.4, 0.4))[source]¶ This plots the unsmoothed/smoothed ACF vs lag.
Parameters:  acfp (dict) – This is the dict returned from macf_period_find below.
 outfile (str) – The output file the plot will be written to.
 maxlags (int) – The maximum number of lags to include in the plot.
 yrange (sequence of two floats) – The yrange of the ACF vs. lag plot to use.

astrobase.periodbase.macf.
macf_period_find
(times, mags, errs, fillgaps=0.0, filterwindow=11, forcetimebin=None, maxlags=None, maxacfpeaks=10, smoothacf=21, smoothfunc=<function _smooth_acf_savgol>, smoothfunckwargs=None, magsarefluxes=False, sigclip=3.0, verbose=True, periodepsilon=0.1, nworkers=None, startp=None, endp=None, autofreq=None, stepsize=None)[source]¶ This finds periods using the McQuillan+ (2013a, 2014) ACF method.
The kwargs from periodepsilon to stepsize don’t do anything but are used to present a consistent API for all periodbase periodfinders to an outside driver (e.g. the one in the checkplotserver).
Parameters:  times,mags,errs (np.array) – The input magnitude/flux timeseries to run the periodfinding for.
 fillgaps ('noiselevel' or float) – This sets what to use to fill in gaps in the time series. If this is ‘noiselevel’, will smooth the light curve using a point window size of filterwindow (this should be an odd integer), subtract the smoothed LC from the actual LC and estimate the RMS. This RMS will be used to fill in the gaps. Other useful values here are 0.0, and npnan.
 filterwindow (int) – The light curve’s smoothing filter window size to use if fillgaps=’noiselevel’.
 forcetimebin (None or float) – This is used to force a particular cadence in the light curve other than the automatically determined cadence. This effectively rebins the light curve to this cadence. This should be in the same time units as times.
 maxlags (None or int) – This is the maximum number of lags to calculate. If None, will calculate all lags.
 maxacfpeaks (int) – This is the maximum number of ACF peaks to use when finding the highest peak and obtaining a fit period.
 smoothacf (int) –
This is the number of points to use as the window size when smoothing the ACF with the smoothfunc. This should be an odd integer value. If this is None, will not smooth the ACF, but this will probably lead to finding spurious peaks in a generally noisy ACF.
For Kepler, a value between 21 and 51 seems to work fine. For ground based data, much larger values may be necessary: between 1001 and 2001 seem to work best for the HAT surveys. This is dependent on cadence, RMS of the light curve, the periods of the objects you’re looking for, and finally, any correlated noise in the light curve. Make a plot of the smoothed/unsmoothed ACF vs. lag using the result dict of this function and the plot_acf_results function above to see the identified ACF peaks and what kind of smoothing might be needed.
The value of smoothacf will also be used to figure out the interval to use when searching for local peaks in the ACF: this interval is 1/2 of the smoothacf value.
 smoothfunc (Python function) – This is the function that will be used to smooth the ACF. This should take at least one kwarg: ‘windowsize’. Other kwargs can be passed in using a dict provided in smoothfunckwargs. By default, this uses a SavitskyGolay filter, a Gaussian filter is also provided but not used. Another good option would be an actual lowpass filter (generated using scipy.signal?) to remove all high frequency noise from the ACF.
 smoothfunckwargs (dict or None) – The dict of optional kwargs to pass in to the smoothfunc.
 magsarefluxes (bool) – If your input measurements in mags are actually fluxes instead of mags, set this is True.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 verbose (bool) – If True, will indicate progress and report errors.
Returns: Returns a dict with results. dict[‘bestperiod’] is the estimated best period and dict[‘fitperiodrms’] is its estimated error. Other interesting things in the output include:
 dict[‘acfresults’]: all results from calculating the ACF. in particular, the unsmoothed ACF might be of interest: dict[‘acfresults’][‘acf’] and dict[‘acfresults’][‘lags’].
 dict[‘lags’] and dict[‘acf’] contain the ACF after smoothing was applied.
 dict[‘periods’] and dict[‘lspvals’] can be used to construct a pseudoperiodogram.
 dict[‘naivebestperiod’] is obtained by multiplying the lag at the highest ACF peak with the cadence. This is usually close to the fit period (dict[‘fitbestperiod’]), which is calculated by doing a fit to the lags vs. peak index relation as in McQuillan+ 2014.
Return type: dict
This package contains parallelized implementations of several periodfinding algorithms.
astrobase.lcfit package¶
Fitting routines for light curves. Includes:
astrobase.lcfit.sinusoidal.fourier_fit_magseries()
: fit an arbitrary order Fourier series to a magnitude/flux time series.astrobase.lcfit.nonphysical.spline_fit_magseries()
: fit a univariate cubic spline to a magnitude/flux time series with a specified spline knot fraction.astrobase.lcfit.nonphysical.savgol_fit_magseries()
: apply a SavitzkyGolay smoothing filter to a magnitude/flux time series, returning the resulting smoothed function as a “fit”.astrobase.lcfit.nonphysical.legendre_fit_magseries()
: fit a Legendre function of the specified order to the magnitude/flux time series.astrobase.lcfit.eclipses.gaussianeb_fit_magseries()
: fit a double inverted gaussian eclipsing binary model to the magnitude/flux time seriesastrobase.lcfit.transits.traptransit_fit_magseries()
: fit a trapezoidshaped transit signal to the magnitude/flux time seriesastrobase.lcfit.transits.mandelagol_fit_magseries()
: fit a Mandel & Agol (2002) planet transit model to the flux time series.astrobase.lcfit.transits.mandelagol_and_line_fit_magseries()
: fit a Mandel & Agol 2002 model, + a local line to the flux time series.astrobase.lcfit.transits.fivetransitparam_fit_magseries()
: fit out a line around each transit window in the given light curve, and then fit the light curve for t0, period, a/Rstar, Rp/Rstar, and inclination.
Submodules¶
astrobase.lcfit.eclipses module¶
Light curve fitting routines for eclipsing binaries:
astrobase.lcfit.eclipses.gaussianeb_fit_magseries()
: fit a double inverted gaussian eclipsing binary model to the magnitude/flux time series

astrobase.lcfit.eclipses.
gaussianeb_fit_magseries
(times, mags, errs, ebparams, param_bounds=None, scale_errs_redchisq_unity=True, sigclip=10.0, plotfit=False, magsarefluxes=False, verbose=True, curve_fit_kwargs=None)[source]¶ This fits a double inverted gaussian EB model to a magnitude time series.
Parameters:  times,mags,errs (np.array) – The input mag/flux timeseries to fit the EB model to.
 period (float) – The period to use for EB fit.
 ebparams (list of float) –
This is a list containing the eclipsing binary parameters:
ebparams = [period (time), epoch (time), pdepth (mags), pduration (phase), psdepthratio, secondaryphase]
period is the period in days.
epoch is the time of primary minimum in JD.
pdepth is the depth of the primary eclipse:
 for magnitudes > pdepth should be < 0
 for fluxes > pdepth should be > 0
pduration is the length of the primary eclipse in phase.
psdepthratio is the ratio of the secondary eclipse depth to that of the primary eclipse.
secondaryphase is the phase at which the minimum of the secondary eclipse is located. This effectively parameterizes eccentricity.
If epoch is None, this function will do an initial spline fit to find an approximate minimum of the phased light curve using the given period.
The pdepth provided is checked against the value of magsarefluxes. if magsarefluxes = True, the ebdepth is forced to be > 0; if magsarefluxes = False, the ebdepth is forced to be < 0.
 param_bounds (dict or None) –
This is a dict of the upper and lower bounds on each fit parameter. Should be of the form:
{'period': (lower_bound_period, upper_bound_period), 'epoch': (lower_bound_epoch, upper_bound_epoch), 'pdepth': (lower_bound_pdepth, upper_bound_pdepth), 'pduration': (lower_bound_pduration, upper_bound_pduration), 'psdepthratio': (lower_bound_psdepthratio, upper_bound_psdepthratio), 'secondaryphase': (lower_bound_secondaryphase, upper_bound_secondaryphase)}
 To indicate that a parameter is fixed, use ‘fixed’ instead of a tuple providing its lower and upper bounds as tuple.
 To indicate that a parameter has no bounds, don’t include it in the param_bounds dict.
If this is None, the default value of this kwarg will be:
{'period':(0.0,np.inf), # period is between 0 and inf 'epoch':(0.0, np.inf), # epoch is between 0 and inf 'pdepth':(np.inf,np.inf), # pdepth is between np.inf and np.inf 'pduration':(0.0,1.0), # pduration is between 0.0 and 1.0 'psdepthratio':(0.0,1.0), # psdepthratio is between 0.0 and 1.0 'secondaryphase':(0.0,1.0), # secondaryphase is between 0.0 and 1.0
 scale_errs_redchisq_unity (bool) – If True, the standard errors on the fit parameters will be scaled to
make the reduced chisq = 1.0. This sets the
absolute_sigma
kwarg for thescipy.optimize.curve_fit
function to False.  sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 magsarefluxes (bool) – If True, will treat the input values of mags as fluxes for purposes of plotting the fit and sigclipping.
 plotfit (str or False) – If this is a string, this function will make a plot for the fit to the mag/flux timeseries and writes the plot to the path specified here.
 ignoreinitfail (bool) – If this is True, ignores the initial failure to find a set of optimized Fourier parameters using the global optimization function and proceeds to do a leastsquares fit anyway.
 verbose (bool) – If True, will indicate progress and warn of any problems.
 curve_fit_kwargs (dict or None) – If not None, this should be a dict containing extra kwargs to pass to the scipy.optimize.curve_fit function.
Returns: This function returns a dict containing the model fit parameters, the minimized chisq value and the reduced chisq value. The form of this dict is mostly standardized across all functions in this module:
{ 'fittype':'gaussianeb', 'fitinfo':{ 'initialparams':the initial EB params provided, 'finalparams':the final model fit EB params, 'finalparamerrs':formal errors in the params, 'fitmags': the model fit mags, 'fitepoch': the epoch of minimum light for the fit, }, 'fitchisq': the minimized value of the fit's chisq, 'fitredchisq':the reduced chisq value, 'fitplotfile': the output fit plot if fitplot is not None, 'magseries':{ 'times':input times in phase order of the model, 'phase':the phases of the model mags, 'mags':input mags/fluxes in the phase order of the model, 'errs':errs in the phase order of the model, 'magsarefluxes':input value of magsarefluxes kwarg } }
Return type: dict
astrobase.lcfit.nonphysical module¶
Light curve fitting routines for ‘nonphysical’ models:
astrobase.lcfit.nonphysical.spline_fit_magseries()
: fit a univariate cubic spline to a magnitude/flux time series with a specified spline knot fraction.astrobase.lcfit.nonphysical.savgol_fit_magseries()
: apply a SavitzkyGolay smoothing filter to a magnitude/flux time series, returning the resulting smoothed function as a “fit”.astrobase.lcfit.nonphysical.legendre_fit_magseries()
: fit a Legendre function of the specified order to the magnitude/flux time series.

astrobase.lcfit.nonphysical.
spline_fit_magseries
(times, mags, errs, period, knotfraction=0.01, maxknots=30, sigclip=30.0, plotfit=False, ignoreinitfail=False, magsarefluxes=False, verbose=True)[source]¶ This fits a univariate cubic spline to the phased light curve.
This fit may be better than the Fourier fit for sharply variable objects, like EBs, so can be used to distinguish them from other types of variables.
Parameters:  times,mags,errs (np.array) – The input mag/flux timeseries to fit a spline to.
 period (float) – The period to use for the spline fit.
 knotfraction (float) – The knot fraction is the number of internal knots to use for the spline. A value of 0.01 (or 1%) of the total number of nonnan observations appears to work quite well, without overfitting. maxknots controls the maximum number of knots that will be allowed.
 maxknots (int) – The maximum number of knots that will be used even if knotfraction gives a value to use larger than maxknots. This helps dealing with overfitting to short timescale variations.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 magsarefluxes (bool) – If True, will treat the input values of mags as fluxes for purposes of plotting the fit and sigclipping.
 plotfit (str or False) – If this is a string, this function will make a plot for the fit to the mag/flux timeseries and writes the plot to the path specified here.
 ignoreinitfail (bool) – If this is True, ignores the initial failure to find a set of optimized Fourier parameters using the global optimization function and proceeds to do a leastsquares fit anyway.
 verbose (bool) – If True, will indicate progress and warn of any problems.
Returns: This function returns a dict containing the model fit parameters, the minimized chisq value and the reduced chisq value. The form of this dict is mostly standardized across all functions in this module:
{ 'fittype':'spline', 'fitinfo':{ 'nknots': the number of knots used for the fit 'fitmags': the model fit mags, 'fitepoch': the epoch of minimum light for the fit, }, 'fitchisq': the minimized value of the fit's chisq, 'fitredchisq':the reduced chisq value, 'fitplotfile': the output fit plot if fitplot is not None, 'magseries':{ 'times':input times in phase order of the model, 'phase':the phases of the model mags, 'mags':input mags/fluxes in the phase order of the model, 'errs':errs in the phase order of the model, 'magsarefluxes':input value of magsarefluxes kwarg } }
Return type: dict

astrobase.lcfit.nonphysical.
savgol_fit_magseries
(times, mags, errs, period, windowlength=None, polydeg=2, sigclip=30.0, plotfit=False, magsarefluxes=False, verbose=True)[source]¶ Fit a SavitzkyGolay filter to the magnitude/flux time series.
SG fits successive subsets (windows) of adjacent data points with a loworder polynomial via least squares. At each point (magnitude), it returns the value of the polynomial at that magnitude’s time. This is made significantly cheaper than actually performing least squares for each window through linear algebra tricks that are possible when specifying the window size and polynomial order beforehand. Numerical Recipes Ch 14.8 gives an overview, Eq. 14.8.6 is what Scipy has implemented.
The idea behind SavitzkyGolay is to preserve higher moments (>=2) of the input data series than would be done by a simple moving window average.
Note that the filter assumes evenly spaced data, which magnitude time series are not. By pretending the data points are evenly spaced, we introduce an additional noise source in the function values. This is a relatively small noise source provided that the changes in the magnitude values across the full width of the N=windowlength point window is < sqrt(N/2) times the measurement noise on a single point.
TODO:  Find correct dof for reduced chi squared in savgol_fit_magseries
Parameters:  times,mags,errs (np.array) – The input mag/flux timeseries to fit the SavitskyGolay model to.
 period (float) – The period to use for the model fit.
 windowlength (None or int) – The length of the filter window (the number of coefficients). Must be either positive and odd, or None. (The window is the number of points to the left, and to the right, of whatever point is having a polynomial fit to it locally). Bigger windows at fixed polynomial order risk lowering the amplitude of sharp features. If None, this routine (arbitrarily) sets the windowlength for phased LCs to be either the number of finite data points divided by 300, or polydeg+3, whichever is bigger.
 polydeg (int) – This is the order of the polynomial used to fit the samples. Must be less than windowlength. “Higherorder filters do better at preserving feature heights and widths, but do less smoothing on broader features.” (Numerical Recipes).
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 magsarefluxes (bool) – If True, will treat the input values of mags as fluxes for purposes of plotting the fit and sigclipping.
 plotfit (str or False) – If this is a string, this function will make a plot for the fit to the mag/flux timeseries and writes the plot to the path specified here.
 ignoreinitfail (bool) – If this is True, ignores the initial failure to find a set of optimized Fourier parameters using the global optimization function and proceeds to do a leastsquares fit anyway.
 verbose (bool) – If True, will indicate progress and warn of any problems.
Returns: This function returns a dict containing the model fit parameters, the minimized chisq value and the reduced chisq value. The form of this dict is mostly standardized across all functions in this module:
{ 'fittype':'savgol', 'fitinfo':{ 'windowlength': the window length used for the fit, 'polydeg':the polynomial degree used for the fit, 'fitmags': the model fit mags, 'fitepoch': the epoch of minimum light for the fit, }, 'fitchisq': the minimized value of the fit's chisq, 'fitredchisq':the reduced chisq value, 'fitplotfile': the output fit plot if fitplot is not None, 'magseries':{ 'times':input times in phase order of the model, 'phase':the phases of the model mags, 'mags':input mags/fluxes in the phase order of the model, 'errs':errs in the phase order of the model, 'magsarefluxes':input value of magsarefluxes kwarg } }
Return type: dict

astrobase.lcfit.nonphysical.
legendre_fit_magseries
(times, mags, errs, period, legendredeg=10, sigclip=30.0, plotfit=False, magsarefluxes=False, verbose=True)[source]¶ Fit an arbitraryorder Legendre series, via least squares, to the magnitude/flux time series.
This is a series of the form:
p(x) = c_0*L_0(x) + c_1*L_1(x) + c_2*L_2(x) + ... + c_n*L_n(x)
where L_i’s are Legendre polynomials (also called “Legendre functions of the first kind”) and c_i’s are the coefficients being fit.
This function is mainly just a wrapper to numpy.polynomial.legendre.Legendre.fit.
Parameters:  times,mags,errs (np.array) – The input mag/flux timeseries to fit a Legendre series polynomial to.
 period (float) – The period to use for the Legendre fit.
 legendredeg (int) – This is n in the equation above, e.g. if you give n=5, you will get 6 coefficients. This number should be much less than the number of data points you are fitting.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 magsarefluxes (bool) – If True, will treat the input values of mags as fluxes for purposes of plotting the fit and sigclipping.
 plotfit (str or False) – If this is a string, this function will make a plot for the fit to the mag/flux timeseries and writes the plot to the path specified here.
 ignoreinitfail (bool) – If this is True, ignores the initial failure to find a set of optimized Fourier parameters using the global optimization function and proceeds to do a leastsquares fit anyway.
 verbose (bool) – If True, will indicate progress and warn of any problems.
Returns: This function returns a dict containing the model fit parameters, the minimized chisq value and the reduced chisq value. The form of this dict is mostly standardized across all functions in this module:
{ 'fittype':'legendre', 'fitinfo':{ 'legendredeg': the Legendre polynomial degree used, 'fitmags': the model fit mags, 'fitepoch': the epoch of minimum light for the fit, }, 'fitchisq': the minimized value of the fit's chisq, 'fitredchisq':the reduced chisq value, 'fitplotfile': the output fit plot if fitplot is not None, 'magseries':{ 'times':input times in phase order of the model, 'phase':the phases of the model mags, 'mags':input mags/fluxes in the phase order of the model, 'errs':errs in the phase order of the model, 'magsarefluxes':input value of magsarefluxes kwarg } }
Return type: dict
astrobase.lcfit.sinusoidal module¶
Light curve fitting routines for sinusoidal models:
astrobase.lcfit.sinusoidal.fourier_fit_magseries()
: fit an arbitrary order Fourier series to a magnitude/flux time series.

astrobase.lcfit.sinusoidal.
fourier_fit_magseries
(times, mags, errs, period, fourierorder=None, fourierparams=None, fix_period=True, scale_errs_redchisq_unity=True, sigclip=3.0, magsarefluxes=False, plotfit=False, ignoreinitfail=True, verbose=True, curve_fit_kwargs=None)[source]¶ This fits a Fourier series to a mag/flux time series.
Parameters:  times,mags,errs (np.array) – The input mag/flux timeseries to fit a Fourier cosine series to.
 period (float) – The period to use for the Fourier fit.
 fourierorder (None or int) – If this is an int, will be interpreted as the Fourier order of the series to fit to the input mag/flux timesseries. If this is None and fourierparams is specified, fourierparams will be used directly to generate the fit Fourier series. If fourierparams is also None, this function will try to fit a Fourier cosine series of order 3 to the mag/flux timeseries.
 fourierparams (list of floats or None) –
If this is specified as a list of floats, it must be of the form below:
[fourier_amp1, fourier_amp2, fourier_amp3,...,fourier_ampN, fourier_phase1, fourier_phase2, fourier_phase3,...,fourier_phaseN]
to specify a Fourier cosine series of order N. If this is None and fourierorder is specified, the Fourier order specified there will be used to construct the Fourier cosine series used to fit the input mag/flux timeseries. If both are None, this function will try to fit a Fourier cosine series of order 3 to the input mag/flux timeseries.
 fix_period (bool) – If True, will fix the period with fitting the sinusoidal function to the phased light curve.
 scale_errs_redchisq_unity (bool) – If True, the standard errors on the fit parameters will be scaled to
make the reduced chisq = 1.0. This sets the
absolute_sigma
kwarg for thescipy.optimize.curve_fit
function to False.  sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 magsarefluxes (bool) – If True, will treat the input values of mags as fluxes for purposes of plotting the fit and sigclipping.
 plotfit (str or False) – If this is a string, this function will make a plot for the fit to the mag/flux timeseries and writes the plot to the path specified here.
 ignoreinitfail (bool) – If this is True, ignores the initial failure to find a set of optimized Fourier parameters using the global optimization function and proceeds to do a leastsquares fit anyway.
 verbose (bool) – If True, will indicate progress and warn of any problems.
 curve_fit_kwargs (dict or None) – If not None, this should be a dict containing extra kwargs to pass to the scipy.optimize.curve_fit function.
Returns: This function returns a dict containing the model fit parameters, the minimized chisq value and the reduced chisq value. The form of this dict is mostly standardized across all functions in this module:
{ 'fittype':'fourier', 'fitinfo':{ 'finalparams': the list of final model fit params, 'finalparamerrs': list of errs for each model fit param, 'fitmags': the model fit mags, 'fitperiod': the fit period if this wasn't set to fixed, 'fitepoch': this is times.min() for this fit type, 'actual_fitepoch': time of minimum light from fit model ... other fit function specific keys ... }, 'fitchisq': the minimized value of the fit's chisq, 'fitredchisq':the reduced chisq value, 'fitplotfile': the output fit plot if fitplot is not None, 'magseries':{ 'times':input times in phase order of the model, 'phase':the phases of the model mags, 'mags':input mags/fluxes in the phase order of the model, 'errs':errs in the phase order of the model, 'magsarefluxes':input value of magsarefluxes kwarg } }
NOTE: the returned value of ‘fitepoch’ in the ‘fitinfo’ dict returned by this function is the time value of the first observation since this is where the LC is folded for the fit procedure. To get the actual time of minimum epoch as calculated by a spline fit to the phased LC, use the key ‘actual_fitepoch’ in the ‘fitinfo’ dict.
Return type: dict
astrobase.lcfit.transits module¶
Fitting routines for planetary transits:
astrobase.lcfit.transits.traptransit_fit_magseries()
: fit a trapezoidshaped transit signal to the magnitude/flux time seriesastrobase.lcfit.transits.mandelagol_fit_magseries()
: fit a Mandel & Agol (2002) planet transit model to the flux time series, fixing some parameters (e.g., eccentricity) and varying other parameters (e.g., t0, period, a/Rstar). Priors must be passed by user.astrobase.lcfit.transits.mandelagol_and_line_fit_magseries()
: fit a Mandel & Agol 2002 model, + a local line to the flux time series. Priors must be passed by user.astrobase.lcfit.transits.fivetransitparam_fit_magseries()
: fit out a line around each transit window in the given light curve, and then fit all the transits in the light curve for t0, period, a/Rstar, Rp/Rstar, and inclination. Fixes e to 0, and uses theoretical quadratic limb darkening coefficients in the bandpass given by the user. Figures out the priors, user only needs to pass stellar parameters instead.

astrobase.lcfit.transits.
traptransit_fit_magseries
(times, mags, errs, transitparams, param_bounds=None, scale_errs_redchisq_unity=True, sigclip=10.0, plotfit=False, magsarefluxes=False, verbose=True, curve_fit_kwargs=None)[source]¶ This fits a trapezoid transit model to a magnitude time series.
Parameters:  times,mags,errs (np.array) – The input mag/flux timeseries to fit a trapezoid planettransit model to.
 period (float) – The period to use for the model fit.
 transitparams (list of floats) –
These are initial parameters for the transit model fit. A list of the following form is required:
transitparams = [transit_period (time), transit_epoch (time), transit_depth (flux or mags), transit_duration (phase), ingress_duration (phase)]
 for magnitudes > transit_depth should be < 0
 for fluxes > transit_depth should be > 0
If transitepoch is None, this function will do an initial spline fit to find an approximate minimum of the phased light curve using the given period.
The transitdepth provided is checked against the value of magsarefluxes. if magsarefluxes = True, the transitdepth is forced to be > 0; if magsarefluxes = False, the transitdepth is forced to be < 0.
 param_bounds (dict or None) –
This is a dict of the upper and lower bounds on each fit parameter. Should be of the form:
{'period': (lower_bound_period, upper_bound_period), 'epoch': (lower_bound_epoch, upper_bound_epoch), 'depth': (lower_bound_depth, upper_bound_depth), 'duration': (lower_bound_duration, upper_bound_duration), 'ingressduration': (lower_bound_ingressduration, upper_bound_ingressduration)}
 To indicate that a parameter is fixed, use ‘fixed’ instead of a tuple providing its lower and upper bounds as tuple.
 To indicate that a parameter has no bounds, don’t include it in the param_bounds dict.
If this is None, the default value of this kwarg will be:
{'period':(0.0,np.inf), # period is between 0 and inf 'epoch':(0.0, np.inf), # epoch is between 0 and inf 'depth':(np.inf,np.inf), # depth is between np.inf and np.inf 'duration':(0.0,1.0), # duration is between 0.0 and 1.0 'ingressduration':(0.0,0.5)} # ingress duration between 0.0 and 0.5
 scale_errs_redchisq_unity (bool) – If True, the standard errors on the fit parameters will be scaled to
make the reduced chisq = 1.0. This sets the
absolute_sigma
kwarg for thescipy.optimize.curve_fit
function to False.  sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 magsarefluxes (bool) – If True, will treat the input values of mags as fluxes for purposes of plotting the fit and sigclipping.
 plotfit (str or False) – If this is a string, this function will make a plot for the fit to the mag/flux timeseries and writes the plot to the path specified here.
 ignoreinitfail (bool) – If this is True, ignores the initial failure to find a set of optimized Fourier parameters using the global optimization function and proceeds to do a leastsquares fit anyway.
 verbose (bool) – If True, will indicate progress and warn of any problems.
 curve_fit_kwargs (dict or None) – If not None, this should be a dict containing extra kwargs to pass to the scipy.optimize.curve_fit function.
Returns: This function returns a dict containing the model fit parameters, the minimized chisq value and the reduced chisq value. The form of this dict is mostly standardized across all functions in this module:
{ 'fittype':'traptransit', 'fitinfo':{ 'initialparams':the initial transit params provided, 'finalparams':the final model fit transit params , 'finalparamerrs':formal errors in the params, 'fitmags': the model fit mags, 'fitepoch': the epoch of minimum light for the fit, 'ntransitpoints': the number of LC points in transit phase }, 'fitchisq': the minimized value of the fit's chisq, 'fitredchisq':the reduced chisq value, 'fitplotfile': the output fit plot if fitplot is not None, 'magseries':{ 'times':input times in phase order of the model, 'phase':the phases of the model mags, 'mags':input mags/fluxes in the phase order of the model, 'errs':errs in the phase order of the model, 'magsarefluxes':input value of magsarefluxes kwarg } }
Return type: dict

astrobase.lcfit.transits.
log_posterior_transit
(theta, params, model, t, flux, err_flux, priorbounds)[source]¶ Evaluate posterior probability given proposed model parameters and the observed flux timeseries.

astrobase.lcfit.transits.
log_posterior_transit_plus_line
(theta, params, model, t, flux, err_flux, priorbounds)[source]¶ Evaluate posterior probability given proposed model parameters and the observed flux timeseries.

astrobase.lcfit.transits.
mandelagol_fit_magseries
(times, mags, errs, fitparams, priorbounds, fixedparams, trueparams=None, burninpercent=0.3, plotcorner=False, samplesavpath=False, n_walkers=50, n_mcmc_steps=400, exp_time_minutes=2, eps=0.0001, skipsampling=False, overwriteexistingsamples=False, mcmcprogressbar=False, plotfit=False, magsarefluxes=False, sigclip=10.0, verbose=True, nworkers=4)[source]¶ This fits a Mandel & Agol (2002) planetary transit model to a flux time series. You can fit and fix whatever parameters you want.
It relies on Kreidberg (2015)’s BATMAN implementation for the transit model, emcee to sample the posterior (ForemanMackey et al 2013), corner to plot it, and h5py to save the samples. See e.g., Claret’s work for good guesses of starappropriate limbdarkening parameters.
NOTE: this only works for flux timeseries at the moment.
NOTE: Between the fitparams, priorbounds, and fixedparams dicts, you must specify all of the planetary transit parameters required by BATMAN: [‘t0’, ‘rp’, ‘sma’, ‘incl’, ‘u’, ‘rp’, ‘ecc’, ‘omega’, ‘period’], or the BATMAN model will fail to initialize.
Parameters:  times,mags,errs (np.array) – The input flux timeseries to fit a Fourier cosine series to.
 fitparams (dict) –
This is the initial parameter guesses for MCMC, found e.g., by BLS. The key string format must not be changed, but any parameter can be either “fit” or “fixed”. If it is “fit”, it must have a corresponding prior. For example:
fitparams = {'t0':1325.9, 'rp':np.sqrt(fitd['transitdepth']), 'sma':6.17, 'incl':85, 'u':[0.3, 0.2]}
where ‘u’ is a list of the limb darkening parameters, Linear first, then quadratic. Quadratic limb darkening is the only form implemented.
 priorbounds (dict) –
This sets the lower & upper bounds on uniform prior, e.g.:
priorbounds = {'rp':(0.135, 0.145), 'u_linear':(0.31, 0.3+1), 'u_quad':(0.21, 0.2+1), 't0':(np.min(time), np.max(time)), 'sma':(6,6.4), 'incl':(80,90)}
 fixedparams (dict) –
This sets which parameters are fixed, and their values. For example:
fixedparams = {'ecc':0., 'omega':90., 'limb_dark':'quadratic', 'period':fitd['period'] }
limb_dark must be “quadratic”. It’s “fixed”, because once you choose your limbdarkening model, it’s fixed.
 trueparams (list of floats) – The true parameter values you’re fitting for, if they’re known (e.g., a known planet, or fake data). Only for plotting purposes.
 burninpercent (float) – The percent of MCMC samples to discard as burnin.
 plotcorner (str or False) – If this is a str, points to the path of output corner plot that will be generated for this MCMC run.
 samplesavpath (str) – This must be provided so emcee can save its MCMC samples to disk as HDF5 files. This will set the path of the output HDF5file written.
 n_walkers (int) – The number of MCMC walkers to use.
 n_mcmc_steps (int) – The number of MCMC steps to take.
 exp_time_minutes (int) – Exposure time, in minutes, passed to transit model to smear observations.
 eps (float) – The radius of the n_walkersdimensional Gaussian ball used to initialize the MCMC.
 skipsampling (bool) – If you’ve already collected MCMC samples, and you do not want any more sampling (e.g., just make the plots), set this to be True.
 overwriteexistingsamples (bool) – If you’ve collected samples, but you want to overwrite them, set this to True. Usually, it should be False, which appends samples to samplesavpath HDF5 file.
 mcmcprogressbar (bool) – If True, will show a progress bar for the MCMC process.
 plotfit (str or bool) – If a str, indicates the path of the output fit plot file. If False, no fit plot will be made.
 magsarefluxes (bool) – This indicates if the input measurements in mags are actually fluxes.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 verbose (bool) – If True, will indicate MCMC progress.
 nworkers (int) – The number of parallel workers to launch for MCMC.
Returns: This function returns a dict containing the model fit parameters and other fit information. The form of this dict is mostly standardized across all functions in this module:
{ 'fittype':'mandelagol', 'fitinfo':{ 'initialparams':the initial transit params provided, 'fixedparams':the fixed transit params provided, 'finalparams':the final model fit transit params, 'finalparamerrs':formal errors in the params, 'fitmags': the model fit mags, 'fitepoch': the epoch of minimum light for the fit, 'acceptancefraction': fraction of MCMC ensemble. low=bad. 'autocorrtime': if autocorrtime ~= n_mcmc_steps, not good. }, 'fitplotfile': the output fit plot if fitplot is not None, 'magseries':{ 'times':input times in phase order of the model, 'phase':the phases of the model mags, 'mags':input mags/fluxes in the phase order of the model, 'errs':errs in the phase order of the model, 'magsarefluxes':input value of magsarefluxes kwarg } }
Return type: dict

astrobase.lcfit.transits.
mandelagol_and_line_fit_magseries
(times, mags, errs, fitparams, priorbounds, fixedparams, trueparams=None, burninpercent=0.3, plotcorner=False, timeoffset=0, samplesavpath=False, n_walkers=50, n_mcmc_steps=400, exp_time_minutes=2, eps=0.0001, skipsampling=False, overwriteexistingsamples=False, mcmcprogressbar=False, plotfit=False, scatterxdata=None, scatteryaxes=None, magsarefluxes=True, sigclip=10.0, verbose=True, nworkers=4)[source]¶ The model fit by this function is: a Mandel & Agol (2002) transit, PLUS a line. You can fit and fix whatever parameters you want.
Typical use case: you want to measure transit times of individual SNR >~ 50 transits. You fix all the transit parameters except for the midtime, and also fit for a line locally.
NOTE: this only works for flux timeseries at the moment.
NOTE: Between the fitparams, priorbounds, and fixedparams dicts, you must specify all of the planetary transit parameters required by BATMAN and the parameters for the line fit: [‘t0’, ‘rp’, ‘sma’, ‘incl’, ‘u’, ‘rp’, ‘ecc’, ‘omega’, ‘period’, ‘poly_order0’, poly_order1’], or the BATMAN model will fail to initialize.
Parameters:  times,mags,errs (np.array) – The input flux timeseries to fit a Fourier cosine series to.
 fitparams (dict) –
This is the initial parameter guesses for MCMC, found e.g., by BLS. The key string format must not be changed, but any parameter can be either “fit” or “fixed”. If it is “fit”, it must have a corresponding prior. For example:
fitparams = {'t0':1325.9, 'poly_order0':1, 'poly_order1':0.}
where t0 is the time of transitcenter for a reference transit. poly_order0 corresponds to the intercept of the line, poly_order1 is the slope.
 priorbounds (dict) –
This sets the lower & upper bounds on uniform prior, e.g.:
priorbounds = {'t0':(np.min(time), np.max(time)), 'poly_order0':(0.5,1.5), 'poly_order1':(0.5,0.5) }
 fixedparams (dict) –
This sets which parameters are fixed, and their values. For example:
fixedparams = {'ecc':0., 'omega':90., 'limb_dark':'quadratic', 'period':fitd['period'], 'rp':np.sqrt(fitd['transitdepth']), 'sma':6.17, 'incl':85, 'u':[0.3, 0.2]}
limb_dark must be “quadratic”. It’s “fixed”, because once you choose your limbdarkening model, it’s fixed.
 trueparams (list of floats) – The true parameter values you’re fitting for, if they’re known (e.g., a known planet, or fake data). Only for plotting purposes.
 burninpercent (float) – The percent of MCMC samples to discard as burnin.
 plotcorner (str or False) – If this is a str, points to the path of output corner plot that will be generated for this MCMC run.
 timeoffset (float) – If input times are offset by some constant, and you want saved pickles to fix that.
 samplesavpath (str) – This must be provided so emcee can save its MCMC samples to disk as HDF5 files. This will set the path of the output HDF5file written.
 n_walkers (int) – The number of MCMC walkers to use.
 n_mcmc_steps (int) – The number of MCMC steps to take.
 exp_time_minutes (int) – Exposure time, in minutes, passed to transit model to smear observations.
 eps (float) – The radius of the n_walkersdimensional Gaussian ball used to initialize the MCMC.
 skipsampling (bool) – If you’ve already collected MCMC samples, and you do not want any more sampling (e.g., just make the plots), set this to be True.
 overwriteexistingsamples (bool) – If you’ve collected samples, but you want to overwrite them, set this to True. Usually, it should be False, which appends samples to samplesavpath HDF5 file.
 mcmcprogressbar (bool) – If True, will show a progress bar for the MCMC process.
 plotfit (str or bool) – If a str, indicates the path of the output fit plot file. If False, no fit plot will be made.
 scatterxdata (np.array or None) – Use this to overplot x,y scatter points on the output model/data lightcurve (e.g., to highlight bad data, or to indicate an ephemeris), this can take a np.ndarray with the same units as times.
 scatteryaxes (np.array or None) – Use this to provide the yvalues for scatterxdata, in units of fraction of an axis.
 magsarefluxes (bool) – This indicates if the input measurements in mags are actually fluxes.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 verbose (bool) – If True, will indicate MCMC progress.
 nworkers (int) – The number of parallel workers to launch for MCMC.
Returns: This function returns a dict containing the model fit parameters and other fit information. The form of this dict is mostly standardized across all functions in this module:
{ 'fittype':'mandelagol_and_line', 'fitinfo':{ 'initialparams':the initial transit params provided, 'fixedparams':the fixed transit params provided, 'finalparams':the final model fit transit params, 'finalparamerrs':formal errors in the params, 'fitmags': the model fit mags, 'fitepoch': the epoch of minimum light for the fit, 'acceptancefraction': fraction of MCMC ensemble. low=bad. 'autocorrtime': if autocorrtime ~= n_mcmc_steps, not good. }, 'fitplotfile': the output fit plot if fitplot is not None, 'magseries':{ 'times':input times in phase order of the model, 'phase':the phases of the model mags, 'mags':input mags/fluxes in the phase order of the model, 'errs':errs in the phase order of the model, 'magsarefluxes':input value of magsarefluxes kwarg } }
Return type: dict

astrobase.lcfit.transits.
fivetransitparam_fit_magseries
(times, mags, errs, teff, rstar, logg, identifier, fit_savdir, chain_savdir, n_mcmc_steps=1, overwriteexistingsamples=False, burninpercent=0.3, n_transit_durations=5, make_tlsfit_plot=True, exp_time_minutes=30, bandpass='tess', magsarefluxes=True, nworkers=32)[source]¶ Wrapper to mandelagol_fit_magseries that fits out a line around each transit window in the given light curve, and then fits the entire light curve for (t0, period, a/Rstar, Rp/Rstar, inclination). Fixes e to 0, and uses theoretical quadratic limb darkening coefficients in the bandpass given by the user, as found with the stellar parameters. Figures out the priors for you.
Typical use case: you have a light curve with >=2 transits in it. You want to fit the entire light curve for the parameters noted above, but you don’t want to need to manually determine all the priors.
Parameters:  times,mags,errs (np.array) – The input flux timeseries to fit.
 teff,rstar,logg (float) – Stellar parameters [K, Rsun, cgs] used to get limb darkening coefficients.
 identifier (str) – String that goes into file names to identify the object being fit. E.g., fit CSV file will be at {fit_savdir}/{identifier}_fivetransitparam_fitresults.csv
 fit_savdir (str) – Path to directory where CSV results of fits, fit status files, and diagnostic plots are saved. If it doesn’t exist, this function tries to make it.
 chain_savdir (str) – Path to directory where MCMC chains are saved.
 n_mcmc_steps (int) – Number of steps to run MCMC. (Note: convergence not guaranteed).
 overwriteexistingsamples (bool) – If False, and finds pickle file with saved parameters (in fit_savdir), no additoinal MCMC sampling is done.
 exp_time_minutes (int) – Exposure time in minutes. Used for the model fitting.
 n_transit_durations (int) – The points used in the fit are only those within +/ N transit durations of each transit midpoint. This is to prevent excessive outoftransit data being used in the fit (these points do not inform the model’s parameters).
Returns: (mafr, tlsr, is_converged) –
mafr
is the MandelAgol fit result dictionary, which contains the same information as frommandelagol_and_line_fit_magseries
. Fit parameters are accessed likemaf_empc_errs['fitinfo']['finalparams']['sma']
,tlsr
is the TLS result dictionary, containing keys documented inperiodbase/htls.tls_parallel_pfind
.is_converged : boolean for whether the fitting converged, according to the chain autocorrelation time.
Return type: tuple
astrobase.lcfit.utils module¶
This contains utilities for fitting routines in the rest of this subpackage.

astrobase.lcfit.utils.
get_phased_quantities
(stimes, smags, serrs, period)[source]¶ Does phasefolding for the mag/flux timeseries given a period.
Given finite and sigmaclipped times, magnitudes, and errors, along with the period at which to phasefold the data, perform the phasefolding and return the phasefolded values.
Parameters:  stimes,smags,serrs (np.array) – The sigmaclipped and finite input mag/flux timeseries arrays to operate on.
 period (float) – The period to phase the mag/flux timeseries at. stimes.min() is used as the epoch value to fold the timesseries around.
Returns: (phase, pmags, perrs, ptimes, mintime) – The tuple returned contains the following items:
 phase: phasesorted values of phase at each of stimes
 pmags: phasesorted magnitudes at each phase
 perrs: phasesorted errors
 ptimes: phasesorted times
 mintime: earliest time in stimes.
Return type: tuple

astrobase.lcfit.utils.
make_fit_plot
(phase, pmags, perrs, fitmags, period, mintime, magseriesepoch, plotfit, magsarefluxes=False, wrap=False, model_over_lc=True, fitphase=None)[source]¶ This makes a plot of the LC model fit.
Parameters:  phase,pmags,perrs (np.array) – The actual mag/flux timeseries.
 fitmags (np.array) – The model fit timeseries.
 period (float) – The period at which the phased LC was generated.
 mintime (float) – The minimum time value.
 magseriesepoch (float) – The value of time around which the phased LC was folded.
 plotfit (str) – The name of a file to write the plot to.
 magsarefluxes (bool) – Set this to True if the values in pmags and fitmags are actually fluxes.
 wrap (bool) – If True, will wrap the phased LC around 0.0 to make some phased LCs easier to look at.
 model_over_lc (bool) – Usually, this function will plot the actual LC over the model LC. Set this to True to plot the model over the actual LC; this is most useful when you have a very dense light curve and want to be able to see how it follows the model.
 fitphase (optional np.array) – If passed, use this as x values for fitmags
Returns: Return type: Nothing.

astrobase.lcfit.utils.
iterative_fit
(data_x, data_y, init_coeffs, objective_func, objective_args=None, objective_kwargs=None, optimizer_func=<function least_squares>, optimizer_kwargs=None, optimizer_needs_scalar=False, objective_residualarr_func=None, fit_iterations=5, fit_reject_sigma=3.0, verbose=True, full_output=False)[source]¶ This is a function to run iterative fitting based on repeated sigmaclipping of fit outliers.
Parameters:  data_x (np.array) – Array of the independent variable.
 data_y (np.array) – Array of the dependent variable.
 init_coeffs – The initial values of the fit function coefficients.
 objective_func (Python function) –
A function that is used to calculate residuals between the model and the data_y array. This should have a signature similar to:
def objective_func(fit_coeffs, data_x, data_y, *objective_args, **objective_kwargs)
and return an array of residuals or a scalar value indicating some sort of sum of residuals (depending on what the optimizer function requires).
If this function returns a scalar value, you must set optimizer_needs_scalar to True, and provide a Python function in objective_residualarr_func that returns an array of residuals for each value of data_x and data_y given an array of fit coefficients.
 objective_args (tuple or None) – A tuple of arguments to pass into the objective_func.
 objective_kwargs (dict or None) – A dict of keyword arguments to pass into the objective_func.
 optimizer_func (Python function) –
The function that minimizes the residual between the model and the data_y array using the objective_func. This should have a signature similar to one of the optimizer functions in scipy.optimize, i.e.:
def optimizer_func(objective_func, initial_coeffs, args=(), kwargs={}, ...)
and return a scipy.optimize.OptimizeResult. We’ll rely on the
.success
attribute to determine if the EPD fit was successful, and the.x
attribute to get the values of the fit coefficients.  optimizer_kwargs (dict or None) – A dict of kwargs to pass into the optimizer_func function.
 optimizer_needs_scalar (bool) – If True, this indicates that the optimizer requires a scalar value to be returned from the objective_func. This is the case for scipy.optimize.minimize. If this is True, you must also provide a function in objective_residual_func.
 objective_residualarr_func (Python function) –
This is used in conjunction with optimizer_needs_scalar. The function provided here must return an array of residuals for each value of data_x and data_y given an array of fit coefficients. This is then used to calculate which points are outliers after a fit iteration. The function here must have the following signature:
def objective_residualarr_func(coeffs, data_x, data_y, *objective_args, **objective_kwargs)
 fit_iterations (int) – The number of iterations of the fit to perform while throwing out outliers to the fit.
 fit_reject_sigma (float) – The maximum deviation allowed to consider a data_y item as an outlier to the fit and to remove it from consideration in a successive iteration of the fit.
 verbose (bool) – If True, reports per iteration on the cost function value and the number of items remaining in data_x and data_y after sigmaclipping outliers.
 full_output (bool) – If True, returns the full output from the optimizer_func along with the resulting fit function coefficients.
Returns: result – If full_output was True, will return the fit coefficients np.array as the first element and the optimizer function fit output from the last iteration as the second element of a tuple. If full_output was False, will only return the final fit coefficients as an np.array.
Return type: np.array or tuple
astrobase.lcmath module¶
Contains various useful tools for calculating various things related to lightcurves (like phasing, sigmaclipping, finding and filling gaps, etc.)

astrobase.lcmath.
find_lc_timegroups
(lctimes, mingap=4.0)[source]¶ Finds gaps in the provided timeseries and indexes them into groups.
This finds the gaps in the provided lctimes array, so we can figure out which times are for consecutive observations and which represent gaps between seasons or observing eras.
Parameters:  lctimes (arraylike) – This contains the times to analyze for gaps; assumed to be some form of Julian date.
 mingap (float) – This defines how much the difference between consecutive measurements is allowed to be to consider them as parts of different timegroups. By default it is set to 4.0 days.
Returns: A tuple of the form: (ngroups, [slice(start_ind_1, end_ind_1), …]) is returned. This contains the number of groups as the first element, and a list of Python slice objects for each timegroup found. These can be used directly to index into the array of times to quickly get measurements associated with each group.
Return type: tuple

astrobase.lcmath.
normalize_magseries
(times, mags, mingap=4.0, normto='globalmedian', magsarefluxes=False, debugmode=False)[source]¶ This normalizes the magnitude timeseries to a specified value.
This is used to normalize time series measurements that may have large time gaps and vertical offsets in mag/flux measurement between these ‘timegroups’, either due to instrument changes or different filters.
NOTE: this works inplace! The mags array will be replaced with normalized mags when this function finishes.
Parameters:  times,mags (arraylike) – The times (assumed to be some form of JD) and mags (or flux) measurements to be normalized.
 mingap (float) – This defines how much the difference between consecutive measurements is allowed to be to consider them as parts of different timegroups. By default it is set to 4.0 days.
 normto ({'globalmedian', 'zero'} or a float) –
Specifies the normalization type:
'globalmedian' > norms each mag to the global median of the LC column 'zero' > norms each mag to zero a float > norms each mag to this specified float value.
 magsarefluxes (bool) –
Indicates if the input mags array is actually an array of flux measurements instead of magnitude measurements. If this is set to True, then:
 if normto is ‘zero’, then the median flux is divided from each observation’s flux value to yield normalized fluxes with 1.0 as the global median.
 if normto is ‘globalmedian’, then the global median flux value across the entire time series is multiplied with each measurement.
 if norm is set to a float, then this number is multiplied with the flux value for each measurement.
 debugmode (bool) – If this is True, will print out verbose info on each timegroup found.
Returns: times,normalized_mags – Normalized magnitude values after normalization. If normalization fails for some reason, times and normalized_mags will both be None.
Return type: np.arrays

astrobase.lcmath.
sigclip_magseries
(times, mags, errs, sigclip=None, iterative=False, niterations=None, meanormedian='median', magsarefluxes=False)[source]¶ Sigmaclips a magnitude or flux timeseries.
Selects the finite times, magnitudes (or fluxes), and errors from the passed values, and apply symmetric or asymmetric sigma clipping to them.
Parameters:  times,mags,errs (np.array) –
The magnitude or flux timeseries arrays to sigmaclip. This doesn’t assume all values are finite or if they’re positive/negative. All of these arrays will have their nonfinite elements removed, and then will be sigmaclipped based on the arguments to this function.
errs is optional. Set it to None if you don’t have values for these. A ‘faked’ errs array will be generated if necessary, which can be ignored in the output as well.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 iterative (bool) – If this is set to True, will perform iterative sigmaclipping. If niterations is not set and this is True, sigmaclipping is iterated until no more points are removed.
 niterations (int) – The maximum number of iterations to perform for sigmaclipping. If None, the iterative arg takes precedence, and iterative=True will sigmaclip until no more points are removed. If niterations is not None and iterative is False, niterations takes precedence and iteration will occur for the specified number of iterations.
 meanormedian ({'mean', 'median'}) – Use ‘mean’ for sigmaclipping based on the mean value, or ‘median’ for sigmaclipping based on the median value. Default is ‘median’.
 magsareflux (bool) – True if your “mags” are in fact fluxes, i.e. if “fainter” corresponds to mags getting smaller.
Returns: (stimes, smags, serrs) – The sigmaclipped and nanstripped timeseries.
Return type: tuple
 times,mags,errs (np.array) –

astrobase.lcmath.
sigclip_magseries_with_extparams
(times, mags, errs, extparams, sigclip=None, iterative=False, magsarefluxes=False)[source]¶ Sigmaclips a magnitude or flux timeseries and associated measurement arrays.
Selects the finite times, magnitudes (or fluxes), and errors from the passed values, and apply symmetric or asymmetric sigma clipping to them. Uses the same array indices as these values to filter out the values of all arrays in the extparams list. This can be useful for simultaneously sigmaclipping a magnitude/flux timeseries along with their associated values of external parameters, such as telescope hour angle, zenith distance, temperature, moon phase, etc.
Parameters:  times,mags,errs (np.array) –
The magnitude or flux timeseries arrays to sigmaclip. This doesn’t assume all values are finite or if they’re positive/negative. All of these arrays will have their nonfinite elements removed, and then will be sigmaclipped based on the arguments to this function.
errs is optional. Set it to None if you don’t have values for these. A ‘faked’ errs array will be generated if necessary, which can be ignored in the output as well.
 extparams (list of np.array) – This is a list of all external parameter arrays to simultaneously filter along with the magnitude/flux timeseries. All of these arrays should have the same length as the times, mags, and errs arrays.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 iterative (bool) – If this is set to True, will perform iterative sigmaclipping. If niterations is not set and this is True, sigmaclipping is iterated until no more points are removed.
 magsareflux (bool) – True if your “mags” are in fact fluxes, i.e. if “fainter” corresponds to mags getting smaller.
Returns: (stimes, smags, serrs) – The sigmaclipped and nanstripped timeseries in stimes, smags, serrs and the associated values of the extparams in sextparams.
Return type: tuple
 times,mags,errs (np.array) –

astrobase.lcmath.
phase_magseries
(times, mags, period, epoch, wrap=True, sort=True)[source]¶ Phases a magnitude/flux timeseries using a given period and epoch.
The equation used is:
phase = (times  epoch)/period  floor((times  epoch)/period)
This phases the given magnitude timeseries using the given period and epoch. If wrap is True, wraps the result around 0.0 (and returns an array that has twice the number of the original elements). If sort is True, returns the magnitude timeseries in phase sorted order.
Parameters:  times,mags (np.array) – The magnitude/flux timeseries values to phase using the provided period and epoch. Nonfiinite values will be removed.
 period (float) – The period to use to phase the timeseries.
 epoch (float) – The epoch to phase the timeseries. This is usually the timeofminimum or timeofmaximum of some periodic light curve phenomenon. Alternatively, one can use the minimum time value in times.
 wrap (bool) – If this is True, the returned phased timeseries will be wrapped around phase 0.0, which is useful for plotting purposes. The arrays returned will have twice the number of input elements because of this wrapping.
 sort (bool) – If this is True, the returned phased timeseries will be sorted in increasing phase order.
Returns: A dict of the following form is returned:
{'phase': the phase values, 'mags': the mags/flux values at each phase, 'period': the input `period` used to phase the timeseries, 'epoch': the input `epoch` used to phase the timeseries}
Return type: dict

astrobase.lcmath.
phase_magseries_with_errs
(times, mags, errs, period, epoch, wrap=True, sort=True)[source]¶ Phases a magnitude/flux timeseries using a given period and epoch.
The equation used is:
phase = (times  epoch)/period  floor((times  epoch)/period)
This phases the given magnitude timeseries using the given period and epoch. If wrap is True, wraps the result around 0.0 (and returns an array that has twice the number of the original elements). If sort is True, returns the magnitude timeseries in phase sorted order.
Parameters:  times,mags,errs (np.array) – The magnitude/flux timeseries values and associated measurement errors to phase using the provided period and epoch. Nonfiinite values will be removed.
 period (float) – The period to use to phase the timeseries.
 epoch (float) – The epoch to phase the timeseries. This is usually the timeofminimum or timeofmaximum of some periodic light curve phenomenon. Alternatively, one can use the minimum time value in times.
 wrap (bool) – If this is True, the returned phased timeseries will be wrapped around phase 0.0, which is useful for plotting purposes. The arrays returned will have twice the number of input elements because of this wrapping.
 sort (bool) – If this is True, the returned phased timeseries will be sorted in increasing phase order.
Returns: A dict of the following form is returned:
{'phase': the phase values, 'mags': the mags/flux values at each phase, 'errs': the err values at each phase, 'period': the input `period` used to phase the timeseries, 'epoch': the input `epoch` used to phase the timeseries}
Return type: dict

astrobase.lcmath.
time_bin_magseries
(times, mags, binsize=540.0, minbinelems=7)[source]¶ Bins the given mag/flux timeseries in time using the bin size given.
Parameters:  times,mags (np.array) – The magnitude/flux timeseries to bin in time. Nonfinite elements will be removed from these arrays. At least 10 elements in each array are required for this function to operate.
 binsize (float) – The bin size to use to group together measurements closer than this amount in time. This is in seconds.
 minbinelems (int) – The minimum number of elements required per bin to include it in the output.
Returns: A dict of the following form is returned:
{'jdbin_indices': a list of the index arrays into the nanfiltered input arrays per each bin, 'jdbins': list of bin boundaries for each bin, 'nbins': the number of bins generated, 'binnedtimes': the time values associated with each time bin; this is the median of the times in each bin, 'binnedmags': the mag/flux values associated with each time bin; this is the median of the mags/fluxes in each bin}
Return type: dict

astrobase.lcmath.
time_bin_magseries_with_errs
(times, mags, errs, binsize=540.0, minbinelems=7)[source]¶ Bins the given mag/flux timeseries in time using the bin size given.
Parameters:  times,mags,errs (np.array) – The magnitude/flux timeseries and associated measurement errors to bin in time. Nonfinite elements will be removed from these arrays. At least 10 elements in each array are required for this function to operate.
 binsize (float) – The bin size to use to group together measurements closer than this amount in time. This is in seconds.
 minbinelems (int) – The minimum number of elements required per bin to include it in the output.
Returns: A dict of the following form is returned:
{'jdbin_indices': a list of the index arrays into the nanfiltered input arrays per each bin, 'jdbins': list of bin boundaries for each bin, 'nbins': the number of bins generated, 'binnedtimes': the time values associated with each time bin; this is the median of the times in each bin, 'binnedmags': the mag/flux values associated with each time bin; this is the median of the mags/fluxes in each bin, 'binnederrs': the err values associated with each time bin; this is the median of the errs in each bin}
Return type: dict

astrobase.lcmath.
phase_bin_magseries
(phases, mags, binsize=0.005, minbinelems=7)[source]¶ Bins a phased magnitude/flux timeseries using the bin size provided.
Parameters:  phases,mags (np.array) – The phased magnitude/flux timeseries to bin in phase. Nonfinite elements will be removed from these arrays. At least 10 elements in each array are required for this function to operate.
 binsize (float) – The bin size to use to group together measurements closer than this amount in phase. This is in units of phase.
 minbinelems (int) – The minimum number of elements required per bin to include it in the output.
Returns: A dict of the following form is returned:
{'phasebin_indices': a list of the index arrays into the nanfiltered input arrays per each bin, 'phasebins': list of bin boundaries for each bin, 'nbins': the number of bins generated, 'binnedphases': the phase values associated with each phase bin; this is the median of the phase value in each bin, 'binnedmags': the mag/flux values associated with each phase bin; this is the median of the mags/fluxes in each bin}
Return type: dict

astrobase.lcmath.
phase_bin_magseries_with_errs
(phases, mags, errs, binsize=0.005, minbinelems=7, weights=None)[source]¶ Bins a phased magnitude/flux timeseries using the bin size provided.
Parameters:  phases,mags,errs (np.array) – The phased magnitude/flux timeseries and associated errs to bin in phase. Nonfinite elements will be removed from these arrays. At least 10 elements in each array are required for this function to operate.
 binsize (float) – The bin size to use to group together measurements closer than this amount in phase. This is in units of phase.
 minbinelems (int) – The minimum number of elements required per bin to include it in the output.
 weights (np.array or None) – Optional weight vector to be applied during binning. If if is passed,
np.average is used to bin, rather than np.median. A good choice
would be to pass
weights=1/errs**2
, to weight by the inverse variance.
Returns: A dict of the following form is returned:
{'phasebin_indices': a list of the index arrays into the nanfiltered input arrays per each bin, 'phasebins': list of bin boundaries for each bin, 'nbins': the number of bins generated, 'binnedphases': the phase values associated with each phase bin; this is the median of the phase value in each bin, 'binnedmags': the mag/flux values associated with each phase bin; this is the median of the mags/fluxes in each bin, 'binnederrs': the err values associated with each phase bin; this is the median of the errs in each bin}
Return type: dict

astrobase.lcmath.
fill_magseries_gaps
(times, mags, errs, fillgaps=0.0, sigclip=3.0, magsarefluxes=False, filterwindow=11, forcetimebin=None, verbose=True)[source]¶ This fills in gaps in a light curve.
This is mainly intended for use in ACF periodfinding, but maybe useful otherwise (i.e. when we figure out ARMA stuff for LCs). The main steps here are:
 normalize the light curve to zero
 remove giant outliers
 interpolate gaps in the light curve (since ACF requires evenly spaced sampling)
From McQuillan+ 2013a (https://doi.org/10.1093/mnras/stt536):
“The ACF calculation requires the light curves to be regularly sampled and normalized to zero. We divided the flux in each quarter by its median and subtracted unity. Gaps in the light curve longer than the Kepler long cadence were filled using linear interpolation with added white Gaussian noise. This noise level was estimated using the variance of the residuals following subtraction of a smoothed version of the flux. To smooth the flux, we applied an iterative nonlinear filter which consists of a median filter followed by a boxcar filter, both with 11point windows, with iterative 3σ clipping of outliers.”Parameters:  times,mags,errs (np.array) – The magnitude/flux timeseries and associated measurement errors to operate on. Nonfinite elements will be removed from these arrays. At least 10 elements in each array are required for this function to operate.
 fillgaps ({'noiselevel', 'nan'} or float) – If fillgap=’noiselevel’, fills the gaps with the noise level obtained via the procedure above. If fillgaps=’nan’, fills the gaps with np.nan. Otherwise, if fillgaps is a float, will use that value to fill the gaps. The default is to fill the gaps with 0.0 (as in McQuillan+ 2014) to “…prevent them contributing to the ACF”.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 magsareflux (bool) – True if your “mags” are in fact fluxes, i.e. if “fainter” corresponds to mags getting smaller.
 filterwindow (int) – The number of timeseries points to include in the SavitskyGolay filter operation when smoothing the light curve. This should be an odd integer.
 forcetimebin (float or None) – If forcetimebin is a float, this value will be used to generate the interpolated time series, effectively binning the light curve to this cadence. If forcetimebin is None, the mode of the gaps (the forward difference between successive time values in times) in the provided light curve will be used as the effective cadence. NOTE: forcetimebin must be in the same units as times, e.g. if times are JD then forcetimebin must be in days as well
 verbose (bool) – If this is True, will indicate progress at various stages in the operation.
Returns: A dict of the following form is returned:
{'itimes': the interpolated time values after gapfilling, 'imags': the interpolated mag/flux values after gapfilling, 'ierrs': the interpolated mag/flux values after gapfilling, 'cadence': the cadence of the output mag/flux timeseries}
Return type: dict
astrobase.lcmodels package¶
This contains various light curve models for variable stars. Useful for first order fits to distinguish between variable types, and for generating these variables’ light curves for a recovery simulation.
astrobase.lcmodels.transits
: trapezoidshaped planetary transit light curves.astrobase.lcmodels.eclipses
: double invertedgaussian shaped eclipsing binary light curves.astrobase.lcmodels.flares
: stellar flare model from Pitkin+ 2014.astrobase.lcmodels.sinusoidal
: sinusoidal light curve generation for pulsating variables.
Submodules¶
astrobase.lcmodels.eclipses module¶
This contains a double gaussian model for first order modeling of eclipsing binaries.

astrobase.lcmodels.eclipses.
invgauss_eclipses_func
(ebparams, times, mags, errs)[source]¶ This returns a double eclipse shaped function.
Suitable for first order modeling of eclipsing binaries.
Parameters:  ebparams (list of float) –
This contains the parameters for the eclipsing binary:
ebparams = [period (time), epoch (time), pdepth: primary eclipse depth (mags), pduration: primary eclipse duration (phase), psdepthratio: primarysecondary eclipse depth ratio, secondaryphase: center phase of the secondary eclipse]
period is the period in days.
epoch is the time of minimum in JD.
pdepth is the depth of the primary eclipse.
 for magnitudes > pdepth should be < 0
 for fluxes > pdepth should be > 0
pduration is the length of the primary eclipse in phase.
psdepthratio is the ratio in the eclipse depths: depth_secondary/depth_primary. This is generally the same as the ratio of the T_effs of the two stars.
secondaryphase is the phase at which the minimum of the secondary eclipse is located. This effectively parameterizes eccentricity.
All of these will then have fitted values after the fit is done.
 times,mags,errs (np.array) – The input timeseries of measurements and associated errors for which the eclipse model will be generated. The times will be used to generate model mags, and the input times, mags, and errs will be resorted by model phase and returned.
Returns: (modelmags, phase, ptimes, pmags, perrs) – Returns the model mags and phase values. Also returns the input times, mags, and errs sorted by the model’s phase.
Return type: tuple
 ebparams (list of float) –

astrobase.lcmodels.eclipses.
invgauss_eclipses_curvefit_func
(times, period, epoch, pdepth, pduration, psdepthratio, secondaryphase, zerolevel=0.0, fixed_params=None)[source]¶ This is the invgauss eclipses function used with scipy.optimize.curve_fit.
Parameters:  times (np.array) – The array of times at which the model will be evaluated.
 period (float) – The period of the eclipsing binary.
 epoch (float) – The mid eclipse time of the primary eclipse. In the same units as times.
 pdepth (float) – The depth of the primary eclipse.
 pduration (float) – The duration of the primary eclipse. In units of phase.
 psdepthratio (float) – The ratio between the depths of the primary and secondary eclipse.
 secondaryphase (float) – The phase of the secondary eclipse.
 zerolevel (float) – The out of eclipse value of the model.
 fixed_params (dict or None) –
If this is provided, must be a dict containing the parameters to fix and their values. Should be of the form below:
{'period': fixed value, 'epoch': fixed value, 'pdepth': fixed value, 'pduration': fixed value, 'psdepthratio': fixed value, 'secondaryphase': fixed value}
Any parameter in the dict provided will have its parameter fixed to the provided value. This is best done with an application of functools.partial before passing the function to the scipy.optimize.curve_fit function, e.g.:
curvefit_func = functools.partial( eclipses.invgauss_eclipses_curvefit_func, zerolevel=np.median(mags), fixed_params={'secondaryphase':0.5}) fit_params, fit_cov = scipy.optimize.curve_fit( curvefit_func, times, mags, p0=initial_params, sigma=errs, ...)
Returns: model – Returns the transit model as an np.array. This is in the same order as the times input array.
Return type: np.array

astrobase.lcmodels.eclipses.
invgauss_eclipses_residual
(ebparams, times, mags, errs)[source]¶ This returns the residual between the modelmags and the actual mags.
Parameters:  ebparams (list of float) –
This contains the parameters for the eclipsing binary:
ebparams = [period (time), epoch (time), pdepth: primary eclipse depth (mags), pduration: primary eclipse duration (phase), psdepthratio: primarysecondary eclipse depth ratio, secondaryphase: center phase of the secondary eclipse]
period is the period in days.
epoch is the time of minimum in JD.
pdepth is the depth of the primary eclipse.
 for magnitudes > pdepth should be < 0
 for fluxes > pdepth should be > 0
pduration is the length of the primary eclipse in phase.
psdepthratio is the ratio in the eclipse depths: depth_secondary/depth_primary. This is generally the same as the ratio of the T_effs of the two stars.
secondaryphase is the phase at which the minimum of the secondary eclipse is located. This effectively parameterizes eccentricity.
All of these will then have fitted values after the fit is done.
 times,mags,errs (np.array) – The input timeseries of measurements and associated errors for which the eclipse model will be generated. The times will be used to generate model mags, and the input times, mags, and errs will be resorted by model phase and returned.
Returns: The residuals between the input mags and generated modelmags, weighted by the measurement errors in errs.
Return type: np.array
 ebparams (list of float) –
astrobase.lcmodels.flares module¶
This contains a stellar flare model from Pitkin+ 2014.
http://adsabs.harvard.edu/abs/2014MNRAS.445.2268P

astrobase.lcmodels.flares.
flare_model
(flareparams, times, mags, errs)[source]¶ This is a flare model function, similar to Kowalski+ 2011.
From the paper by Pitkin+ 2014: http://adsabs.harvard.edu/abs/2014MNRAS.445.2268P
Parameters:  flareparams (list of float) –
This defines the flare model:
[amplitude, flare_peak_time, rise_gaussian_stdev, decay_time_constant]
where:
amplitude: the maximum flare amplitude in mags or flux. If flux, then amplitude should be positive. If mags, amplitude should be negative.
flare_peak_time: time at which the flare maximum happens.
rise_gaussian_stdev: the stdev of the gaussian describing the rise of the flare.
decay_time_constant: the time constant of the exponential fall of the flare.
 times,mags,errs (np.array) – The input timeseries of measurements and associated errors for which the model will be generated. The times will be used to generate model mags.
Returns: (modelmags, times, mags, errs) – Returns the model mags evaluated at the input time values. Also returns the input times, mags, and errs.
Return type: tuple
 flareparams (list of float) –

astrobase.lcmodels.flares.
flare_model_residual
(flareparams, times, mags, errs)[source]¶ This returns the residual between model mags and the actual mags.
Parameters:  flareparams (list of float) –
This defines the flare model:
[amplitude, flare_peak_time, rise_gaussian_stdev, decay_time_constant]
where:
amplitude: the maximum flare amplitude in mags or flux. If flux, then amplitude should be positive. If mags, amplitude should be negative.
flare_peak_time: time at which the flare maximum happens.
rise_gaussian_stdev: the stdev of the gaussian describing the rise of the flare.
decay_time_constant: the time constant of the exponential fall of the flare.
 times,mags,errs (np.array) – The input timeseries of measurements and associated errors for which the model will be generated. The times will be used to generate model mags.
Returns: The residuals between the input mags and generated modelmags, weighted by the measurement errors in errs.
Return type: np.array
 flareparams (list of float) –
astrobase.lcmodels.sinusoidal module¶
This contains models for sinusoidal light curves generated using Fourier expansion.

astrobase.lcmodels.sinusoidal.
fourier_sinusoidal_func
(fourierparams, times, mags, errs)[source]¶ This generates a sinusoidal light curve using a Fourier cosine series.
Parameters:  fourierparams (list) –
This MUST be a list of the following form like so:
[period, epoch, [amplitude_1, amplitude_2, amplitude_3, ..., amplitude_X], [phase_1, phase_2, phase_3, ..., phase_X]]
where X is the Fourier order.
 times,mags,errs (np.array) – The input timeseries of measurements and associated errors for which the model will be generated. The times will be used to generate model mags, and the input times, mags, and errs will be resorted by model phase and returned.
Returns: (modelmags, phase, ptimes, pmags, perrs) – Returns the model mags and phase values. Also returns the input times, mags, and errs sorted by the model’s phase.
Return type: tuple
 fourierparams (list) –

astrobase.lcmodels.sinusoidal.
fourier_curvefit_func
(times, period, *fourier_coeffs, zerolevel=0.0, epoch=None, fixed_period=None)[source]¶ This is a function to be used with scipy.optimize.curve_fit.
Parameters:  times (np.array) – An array of times at which the model will be evaluated.
 period (float) – The period of the sinusoidal variability.
 fourier_coeffs (float) – These should be the amplitudes and phases of the sinusoidal series sum. 2N coefficients are required for Fourier order = N. The first N coefficients will be used as the amplitudes and the second N coefficients will be used as the phases.
 zerolevel (float) – The base level of the model.
 epoch (float or None) – The epoch to use to generate the phased light curve. If None, the minimum value of the times array will be used.
 fixed_period (float or None) – If not None, will indicate that the period is to be held fixed at the provided value.
Returns: model – Returns the sinusodial series sum model evaluated at each value of times.
Return type: np.array

astrobase.lcmodels.sinusoidal.
fourier_sinusoidal_residual
(fourierparams, times, mags, errs)[source]¶ This returns the residual between the model mags and the actual mags.
Parameters:  fourierparams (list) –
This MUST be a list of the following form like so:
[period, epoch, [amplitude_1, amplitude_2, amplitude_3, ..., amplitude_X], [phase_1, phase_2, phase_3, ..., phase_X]]
where X is the Fourier order.
 times,mags,errs (np.array) – The input timeseries of measurements and associated errors for which the model will be generated. The times will be used to generate model mags, and the input times, mags, and errs will be resorted by model phase and returned.
Returns: The residuals between the input mags and generated modelmags, weighted by the measurement errors in errs.
Return type: np.array
 fourierparams (list) –

astrobase.lcmodels.sinusoidal.
sine_series_sum
(fourierparams, times, mags, errs)[source]¶ This generates a sinusoidal light curve using a Fourier sine series.
Parameters:  fourierparams (list) –
This MUST be a list of the following form like so:
[period, epoch, [amplitude_1, amplitude_2, amplitude_3, ..., amplitude_X], [phase_1, phase_2, phase_3, ..., phase_X]]
where X is the Fourier order.
 times,mags,errs (np.array) – The input timeseries of measurements and associated errors for which the model will be generated. The times will be used to generate model mags, and the input times, mags, and errs will be resorted by model phase and returned.
Returns: (modelmags, phase, ptimes, pmags, perrs) – Returns the model mags and phase values. Also returns the input times, mags, and errs sorted by the model’s phase.
Return type: tuple
 fourierparams (list) –
astrobase.lcmodels.transits module¶
This contains a trapezoid model for first order model of planetary transits light curves.

astrobase.lcmodels.transits.
trapezoid_transit_func
(transitparams, times, mags, errs, get_ntransitpoints=False)[source]¶ This returns a trapezoid transitshaped function.
Suitable for first order modeling of transit signals.
Parameters:  transitparams (list of float) –
This contains the transiting planet trapezoid model:
transitparams = [transitperiod (time), transitepoch (time), transitdepth (flux or mags), transitduration (phase), ingressduration (phase)]
All of these will then have fitted values after the fit is done.
 for magnitudes > transitdepth should be < 0
 for fluxes > transitdepth should be > 0
 times,mags,errs (np.array) – The input timeseries of measurements and associated errors for which the transit model will be generated. The times will be used to generate model mags, and the input times, mags, and errs will be resorted by model phase and returned.
Returns: (modelmags, phase, ptimes, pmags, perrs) – Returns the model mags and phase values. Also returns the input times, mags, and errs sorted by the model’s phase.
Return type: tuple
 transitparams (list of float) –

astrobase.lcmodels.transits.
trapezoid_transit_curvefit_func
(times, period, epoch, depth, duration, ingressduration, zerolevel=0.0, fixed_params=None)[source]¶ This is the function used for scipy.optimize.curve_fit.
Parameters:  times (np.array) – The array of times used to construct the transit model.
 period (float) – The period of the transit.
 epoch (float) – The time of midtransit (phase 0.0). Must be in the same units as times.
 depth (float) – The depth of the transit.
 duration (float) – The duration of the transit in phase units.
 ingressduration (float) – The ingress duration of the transit in phase units.
 zerolevel (float) – The level of the measurements outside transit.
 fixed_params (dict or None) –
If this is provided, must be a dict containing the parameters to fix and their values. Should be of the form below:
{'period': fixed value, 'epoch': fixed value, 'depth': fixed value, 'duration': fixed value, 'ingressduration': fixed value}
Any parameter in the dict provided will have its parameter fixed to the provided value. This is best done with an application of functools.partial before passing the function to the scipy.optimize.curve_fit function, e.g.:
curvefit_func = functools.partial( transits.trapezoid_transit_curvefit_func, zerolevel=np.median(mags), fixed_params={'ingressduration':0.05}) fit_params, fit_cov = scipy.optimize.curve_fit( curvefit_func, times, mags, p0=initial_params, sigma=errs, ...)
Returns: model – Returns the transit model as an np.array. This is in the same order as the times input array.
Return type: np.array

astrobase.lcmodels.transits.
trapezoid_transit_residual
(transitparams, times, mags, errs)[source]¶ This returns the residual between the modelmags and the actual mags.
Parameters:  transitparams (list of float) –
This contains the transiting planet trapezoid model:
transitparams = [transitperiod (time), transitepoch (time), transitdepth (flux or mags), transitduration (phase), ingressduration (phase)]
All of these will then have fitted values after the fit is done.
 for magnitudes > transitdepth should be < 0
 for fluxes > transitdepth should be > 0
 times,mags,errs (np.array) – The input timeseries of measurements and associated errors for which the transit model will be generated. The times will be used to generate model mags, and the input times, mags, and errs will be resorted by model phase and returned.
Returns: The residuals between the input mags and generated modelmags, weighted by the measurement errors in errs.
Return type: np.array
 transitparams (list of float) –
astrobase.varbase package¶
Contains functions to deal with light curve variability, fitting functions, masking signals, autocorrelation, etc.
astrobase.varbase.autocorr
: calculating the autocorrelation function of light curves.astrobase.varbase.signals
: masking periodic signals, prewhitening light curves.astrobase.varbase.transits
: light curve tools specifically for planetary transits.astrobase.varbase.trends
: tools for running External Parameter Decorrelation (EPD) on light curves.
FIXME: finish up the astrobase.varbase.flares
module to find flares in
LCs.
Submodules¶
astrobase.varbase.autocorr module¶
Calculates the autocorrelation for magnitude time series.

astrobase.varbase.autocorr.
_autocorr_func1
(mags, lag, maglen, magmed, magstd)[source]¶ Calculates the autocorr of mag series for specific lag.
This version of the function is taken from: Kim et al. (2011)
Parameters:  mags (np.array) – This is the magnitudes array. MUST NOT have any nans.
 lag (float) – The specific lag value to calculate the autocorrelation for. This MUST be less than total number of observations in mags.
 maglen (int) – The number of elements in the mags array.
 magmed (float) – The median of the mags array.
 magstd (float) – The standard deviation of the mags array.
Returns: The autocorrelation at this specific lag value.
Return type: float

astrobase.varbase.autocorr.
_autocorr_func2
(mags, lag, maglen, magmed, magstd)[source]¶ This is an alternative function to calculate the autocorrelation.
This version is from (first definition):
https://en.wikipedia.org/wiki/Correlogram#Estimation_of_autocorrelations
Parameters:  mags (np.array) – This is the magnitudes array. MUST NOT have any nans.
 lag (float) – The specific lag value to calculate the autocorrelation for. This MUST be less than total number of observations in mags.
 maglen (int) – The number of elements in the mags array.
 magmed (float) – The median of the mags array.
 magstd (float) – The standard deviation of the mags array.
Returns: The autocorrelation at this specific lag value.
Return type: float

astrobase.varbase.autocorr.
_autocorr_func3
(mags, lag, maglen, magmed, magstd)[source]¶ This is yet another alternative to calculate the autocorrelation.
Taken from: Bayesian Methods for Hackers by Cameron Pilon
(This should be the fastest method to calculate ACFs.)
Parameters:  mags (np.array) – This is the magnitudes array. MUST NOT have any nans.
 lag (float) – The specific lag value to calculate the autocorrelation for. This MUST be less than total number of observations in mags.
 maglen (int) – The number of elements in the mags array.
 magmed (float) – The median of the mags array.
 magstd (float) – The standard deviation of the mags array.
Returns: The autocorrelation at this specific lag value.
Return type: float

astrobase.varbase.autocorr.
autocorr_magseries
(times, mags, errs, maxlags=1000, func=<function _autocorr_func3>, fillgaps=0.0, filterwindow=11, forcetimebin=None, sigclip=3.0, magsarefluxes=False, verbose=True)[source]¶ This calculates the ACF of a light curve.
This will preprocess the light curve to fill in all the gaps and normalize everything to zero. If fillgaps = ‘noiselevel’, fills the gaps with the noise level obtained via the procedure above. If fillgaps = ‘nan’, fills the gaps with np.nan.
Parameters:  times,mags,errs (np.array) – The measurement timeseries and associated errors.
 maxlags (int) – The maximum number of lags to calculate.
 func (Python function) – This is a function to calculate the lags.
 fillgaps ('noiselevel' or float) – This sets what to use to fill in gaps in the time series. If this is ‘noiselevel’, will smooth the light curve using a point window size of filterwindow (this should be an odd integer), subtract the smoothed LC from the actual LC and estimate the RMS. This RMS will be used to fill in the gaps. Other useful values here are 0.0, and npnan.
 filterwindow (int) – The light curve’s smoothing filter window size to use if fillgaps=’noiselevel’.
 forcetimebin (None or float) – This is used to force a particular cadence in the light curve other than the automatically determined cadence. This effectively rebins the light curve to this cadence. This should be in the same time units as times.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 magsarefluxes (bool) – If your input measurements in mags are actually fluxes instead of mags, set this is True.
 verbose (bool) – If True, will indicate progress and report errors.
Returns: A dict of the following form is returned:
{'itimes': the interpolated time values after gapfilling, 'imags': the interpolated mag/flux values after gapfilling, 'ierrs': the interpolated mag/flux values after gapfilling, 'cadence': the cadence of the output mag/flux timeseries, 'minitime': the minimum value of the interpolated times array, 'lags': the lags used to calculate the autocorrelation function, 'acf': the value of the ACF at each lag used}
Return type: dict
astrobase.varbase.flares module¶
Contains functions to deal with finding stellar flares in time series.
FIXME: finish this module.

astrobase.varbase.flares.
add_flare_model
(flareparams, times, mags, errs)[source]¶ This adds a flare model function to the input magnitude/flux timeseries.
Parameters:  flareparams (list of float) –
This defines the flare model:
[amplitude, flare_peak_time, rise_gaussian_stdev, decay_time_constant]
where:
amplitude: the maximum flare amplitude in mags or flux. If flux, then amplitude should be positive. If mags, amplitude should be negative.
flare_peak_time: time at which the flare maximum happens.
rise_gaussian_stdev: the stdev of the gaussian describing the rise of the flare.
decay_time_constant: the time constant of the exponential fall of the flare.
 times,mags,errs (np.array) – The input timeseries of measurements and associated errors for which the model will be generated. The times will be used to generate model mags.
 magsarefluxes (bool) – Sets the correct direction of the flare amplitude (+ve) for fluxes if True and for mags (ve) if False.
Returns: A dict of the form below is returned:
{'times': the original times array 'mags': the original mags + the flare model mags evaluated at times, 'errs': the original errs array, 'flareparams': the input list of flare params}
Return type: dict
 flareparams (list of float) –

astrobase.varbase.flares.
simple_flare_find
(times, mags, errs, smoothbinsize=97, flare_minsigma=4.0, flare_maxcadencediff=1, flare_mincadencepoints=3, magsarefluxes=False, savgol_polyorder=2, **savgol_kwargs)[source]¶ This finds flares in time series using the method in Walkowicz+ 2011.
FIXME: finish this.
Parameters:  times,mags,errs (np.array) – The input timeseries to find flares in.
 smoothbinsize (int) – The number of consecutive light curve points to smooth over in the time series using a SavitskyGolay filter. The smoothed light curve is then subtracted from the actual light curve to remove trends that potentially last smoothbinsize light curve points. The default value is chosen as ~6.5 hours (97 x 4 minute cadence for HATNet/HATSouth).
 flare_minsigma (float) – The minimum sigma above the median LC level to designate points as belonging to possible flares.
 flare_maxcadencediff (int) – The maximum number of light curve points apart each possible flare event measurement is allowed to be. If this is 1, then we’ll look for consecutive measurements.
 flare_mincadencepoints (int) – The minimum number of light curve points (each flare_maxcadencediff points apart) required that are at least flare_minsigma above the median light curve level to call an event a flare.
 magsarefluxes (bool) – If True, indicates that mags is actually an array of fluxes.
 savgol_polyorder (int) – The polynomial order of the function used by the SavitskyGolay filter.
 savgol_kwargs (extra kwargs) – Any remaining keyword arguments are passed directly to the savgol_filter function from scipy.signal.
Returns: (nflares, flare_indices) – Returns the total number of flares found and their timeindices (start, end) as tuples.
Return type: tuple
astrobase.varbase.signals module¶
Contains functions to deal with masking and removing periodic signals in light curves.

astrobase.varbase.signals.
prewhiten_magseries
(times, mags, errs, whitenperiod, whitenparams, sigclip=3.0, magsarefluxes=False, plotfit=None, plotfitphasedlconly=True, rescaletomedian=True)[source]¶ Removes a periodic sinusoidal signal generated using whitenparams from the input magnitude time series.
Parameters:  times,mags,errs (np.array) – The input mag/flux timeseries to prewhiten.
 whitenperiod (float) – The period of the sinusoidal signal to remove.
 whitenparams (list of floats) –
This contains the Fourier amplitude and phase coefficients of the sinusoidal signal to remove:
[ampl_1, ampl_2, ampl_3, ..., ampl_X, pha_1, pha_2, pha_3, ..., pha_X]
where X is the Fourier order. These are usually the output of a previous Fourier fit to the light curve (from
astrobase.lcfit.sinusoidal.fourier_fit_magseries()
for example).  sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 magsarefluxes (bool) – If True, will treat the input values of mags as fluxes for purposes of plotting the fit and sigclipping.
 plotfit (str or False) – If this is a string, this function will make a plot showing the effect of the prewhitening on the mag/flux timeseries and write the plot to the path specified here.
 plotfitphasedlconly (bool) – If True, will plot only the phased LC for showing the effect of prewhitening, and skip plotting the unphased LC.
 rescaletomedian (bool) – If this is True, then we add back the constant median term of the magnitudes to the final prewhitened mag series.
Returns: Returns a dict of the form:
{'wtimes':times array after prewhitening, 'wphase':phase array after prewhitening, 'wmags':mags array after prewhitening, 'werrs':errs array after prewhitening, 'whitenparams':the input prewhitening params used, 'whitenperiod':the input prewhitening period used, 'fitplotfile':the output plot file if plotfit was set}
Return type: dict

astrobase.varbase.signals.
gls_prewhiten
(times, mags, errs, fourierorder=3, initfparams=None, startp_gls=None, endp_gls=None, stepsize=0.0001, autofreq=True, sigclip=30.0, magsarefluxes=False, nbestpeaks=5, nworkers=4, plotfits=None)[source]¶ Iterative prewhitening of a magnitude series using the LS periodogram.
This finds the best period, fits a fourier series with the best period, then whitens the time series with the best period, and repeats until nbestpeaks are done.
Parameters:  times,mags,errs (np.array) – The input mag/flux timeseries to iteratively prewhiten.
 fourierorder (int) – The Fourier order of the sinusoidal signal to fit to the timeseries and iteratively remove.
 initfparams (list or None) –
If this is provided, should be a list of Fourier amplitudes and phases in the following format:
[ampl_1, ampl_2, ampl_3, ..., ampl_X, pha_1, pha_2, pha_3, ..., pha_X]
where X is the Fourier order. These are usually the output of a previous Fourier fit to the light curve (from
astrobase.lcfit.sinusoidal.fourier_fit_magseries()
for example). You MUST provide ONE of fourierorder and initfparams, but not both. If both are provided or both are None, a sinusoidal signal of Fourier order 3 will be used by default.  endp_gls (startp_gls,) – If these are provided, will serve as input to the Generalized LombScargle function that will attempt to find the best nbestpeaks periods in the timeseries. These set the minimum and maximum period to search for in the timeseries.
 stepsize (float) – The stepsize in frequency to use when constructing a frequency grid for the period search.
 autofreq (bool) – If this is True, the value of stepsize will be ignored and the
astrobase.periodbase.get_frequency_grid()
function will be used to generate a frequency grid based on startp, and endp. If these are None as well, startp will be set to 0.1 and endp will be set to times.max()  times.min().  sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 magsarefluxes (bool) – If the input measurement values in mags and errs are in fluxes, set this to True.
 nbestpeaks (int) – The number of ‘best’ peaks to return from the periodogram results, starting from the global maximum of the periodogram peak values.
 nworkers (int) – The number of parallel workers to use when calculating the periodogram.
 plotfits (None or str) – If this is a str, should indicate the file to which a plot of the successive iterations of prewhitening will be written to. This will contain a row of plots indicating the before/after states of the light curves for each round of prewhitening.
Returns: (bestperiods, plotfile) – This returns a list of the best periods (with the “highest” peak in the periodogram) after each round of prewhitening is done. If plotfit is a str, will also return the path to the generated plot file.
Return type: tuple

astrobase.varbase.signals.
mask_signal
(times, mags, errs, signalperiod, signalepoch, magsarefluxes=False, maskphases=(0, 0, 0.5, 1.0), maskphaselength=0.1, plotfit=None, plotfitphasedlconly=True, sigclip=30.0)[source]¶ This removes repeating signals in the magnitude time series.
Useful for masking planetary transit signals in light curves to search for other variability.
A small worked example of using this and prewhiten_magseries above:
https://github.com/waqasbhatti/astrobase/issues/77#issuecomment463803558
Parameters:  times,mags,errs (np.array) – The input mag/flux timeseries to run the masking on.
 signalperiod (float) – The period of the signal to mask.
 signalepoch (float) – The epoch of the signal to mask.
 magsarefluxes (bool) – Set to True if mags is actually an array of fluxes.
 maskphases (sequence of floats) – This defines which phase values will be masked. For each item in this sequence, this function will mask a length of phase given by maskphaselength centered on each maskphases value, and remove all LC points in these regions from the light curve.
 maskphaselength (float) – The length in phase to mask for each phase value provided in maskphases.
 plotfit (str or None) – If provided as a str, indicates the output plot file.
 plotfitphasedlconly (bool) – If True, will only plot the effect of masking the signal as requested on the phased LC. If False, will also plot the unphased LC.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
astrobase.varbase.transits module¶
Contains tools for analyzing transits.

astrobase.varbase.transits.
transit_duration_range
(period, min_radius_hint, max_radius_hint)[source]¶ This figures out the minimum and max transit duration (q) given a period and min/max stellar radius hints.
One can get stellar radii from various places:
 GAIA distances and luminosities
 the TESS input catalog
 isochrone fits
The equation used is:
q ~ 0.076 x R**(2/3) x P**(2/3) P = period in days R = stellar radius in solar radii
Parameters:  period (float) – The orbital period of the transiting planet.
 min_radius_hint,max_radius_hint (float) – The minimum and maximum radii of the star the planet is orbiting around.
Returns: (min_transit_duration, max_transit_duration) – The returned tuple contains the minimum and maximum transit durations allowed for the orbital geometry of this planetary system. These can be used with the BLS periodsearch functions in
astrobase.periodbase.kbls
orastrobase.periodbase.abls
to refine the periodsearch to only physically possible transit durations.Return type: tuple

astrobase.varbase.transits.
get_snr_of_dip
(times, mags, modeltimes, modelmags, atol_normalization=1e08, indsforrms=None, magsarefluxes=False, verbose=True, transitdepth=None, npoints_in_transit=None)[source]¶ Calculate the total SNR of a transit assuming gaussian uncertainties.
modelmags gets interpolated onto the cadence of mags. The noise is calculated as the 1sigma std deviation of the residual (see below).
Following Carter et al. 2009:
Q = sqrt( Γ T ) * δ / σ
for Q the total SNR of the transit in the r>0 limit, where:
r = Rp/Rstar, T = transit duration, δ = transit depth, σ = RMS of the lightcurve in transit. Γ = sampling rate
Thus Γ * T is roughly the number of points obtained during transit. (This doesn’t correctly account for the SNR during ingress/egress, but this is a secondorder correction).
Note this is the same total SNR as described by e.g., Kovacs et al. 2002, their Equation 11.
NOTE: this only works with fluxes at the moment.
Parameters:  times,mags (np.array) – The input flux timeseries to process.
 modeltimes,modelmags (np.array) – A transiting planet model, either from BLS, a trapezoid model, or a MandelAgol model.
 atol_normalization (float) – The absolute tolerance to which the median of the passed model fluxes must be equal to 1.
 indsforrms (np.array) – A array of bools of len(mags) used to select points for the RMS measurement. If not passed, the RMS of the entire passed timeseries is used as an approximation. Genearlly, it’s best to use out of transit points, so the RMS measurement is not modeldependent.
 magsarefluxes (bool) – Currently forced to be True because this function only works with fluxes.
 verbose (bool) – If True, indicates progress and warns about problems.
 transitdepth (float or None) – If the transit depth is known, pass it in here. Otherwise, it is calculated assuming OOT flux is 1.
 npoints_in_transits (int or None) – If the number of points in transit is known, pass it in here. Otherwise, the function will guess at this value.
Returns: (snr, transit_depth, noise) – The returned tuple contains the calculated SNR, transit depth, and noise of the residual lightcurve calculated using the relation described above.
Return type: tuple

astrobase.varbase.transits.
estimate_achievable_tmid_precision
(snr, t_ingress_min=10, t_duration_hr=2.14)[source]¶ Using Carter et al. 2009’s estimate, calculate the theoretical optimal precision on midtransit time measurement possible given a transit of a particular SNR.
The relation used is:
sigma_tc = Q^{1} * T * sqrt(θ/2) Q = SNR of the transit. T = transit duration, which is 2.14 hours from discovery paper. θ = τ/T = ratio of ingress to total duration ~= (few minutes [guess]) / 2.14 hours
Parameters:  snr (float) – The measured signaltonoise of the transit, e,g. from
astrobase.periodbase.kbls.bls_stats_singleperiod()
or from running the .compute_stats() method on an Astropy BoxLeastSquares object.  t_ingress_min (float) – The ingress duration in minutes. This is t_I to t_II in Winn (2010) nomenclature.
 t_duration_hr (float) – The transit duration in hours. This is t_I to t_IV in Winn (2010) nomenclature.
Returns: Returns the precision achievable for transitcenter time as calculated from the relation above. This is in days.
Return type: float
 snr (float) – The measured signaltonoise of the transit, e,g. from

astrobase.varbase.transits.
get_transit_times
(blsd, time, extra_maskfrac, trapd=None, nperiodint=1000)[source]¶ Given a BLS period, epoch, and transit ingress/egress points (usually from
astrobase.periodbase.kbls.bls_stats_singleperiod()
), return the times within transit durations + extra_maskfrac of each transit.Optionally, can use the (more accurate) trapezoidal fit period and epoch, if it’s passed. Useful for inspecting individual transits, and masking them out if desired.
Parameters:  blsd (dict) – This is the dict returned by
astrobase.periodbase.kbls.bls_stats_singleperiod()
.  time (np.array) – The times from the timeseries of transit observations used to calculate the initial period.
 extra_maskfrac (float) – This is the separation from intransit points you desire, in units of the transit duration. extra_maskfrac = 0 if you just want points inside transit (see below).
 trapd (dict) – This is a dict returned by
astrobase.lcfit.transits.traptransit_fit_magseries()
containing the trapezoid transit model.  nperiodint (int) – This indicates how many periods backwards/forwards to try and identify transits from the epochs reported in blsd or trapd.
Returns: (tmids_obsd, t_starts, t_ends) –
The returned items are:
tmids_obsd (np.ndarray): best guess of transit midtimes in lightcurve. Has length number of transits in lightcurve. t_starts (np.ndarray): t_Is  extra_maskfrac*tdur, for t_Is transit first contact point. t_ends (np.ndarray): t_Is + extra_maskfrac*tdur, for t_Is transit first contact point.
Return type: tuple of np.array
 blsd (dict) – This is the dict returned by

astrobase.varbase.transits.
given_lc_get_transit_tmids_tstarts_tends
(time, flux, err_flux, blsfit_savpath=None, trapfit_savpath=None, magsarefluxes=True, nworkers=1, sigclip=None, extra_maskfrac=0.03)[source]¶ Gets the transit start, middle, and end times for transits in a given timeseries of observations.
Parameters:  time,flux,err_flux (np.array) – The input flux timeseries measurements and their associated measurement errors
 blsfit_savpath (str or None) – If provided as a str, indicates the path of the fit plot to make for a simple BLS model fit to the transit using the obtained period and epoch.
 trapfit_savpath (str or None) – If provided as a str, indicates the path of the fit plot to make for a trapezoidal transit model fit to the transit using the obtained period and epoch.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 magsarefluxes (bool) – This is by default True for this function, since it works on fluxes only at the moment.
 nworkers (int) – The number of parallel BLS periodfinder workers to use.
 extra_maskfrac (float) –
This is the separation (N) from intransit points you desire, in units of the transit duration. extra_maskfrac = 0 if you just want points inside transit, otherwise:
t_starts = t_Is  N*tdur, t_ends = t_IVs + N*tdur
Thus setting N=0.03 masks slightly more than the guessed transit duration.
Returns: (tmids_obsd, t_starts, t_ends) –
The returned items are:
tmids_obsd (np.ndarray): best guess of transit midtimes in lightcurve. Has length number of transits in lightcurve. t_starts (np.ndarray): t_Is  extra_maskfrac*tdur, for t_Is transit first contact point. t_ends (np.ndarray): t_Is + extra_maskfrac*tdur, for t_Is transit first contact point.
Return type: tuple

astrobase.varbase.transits.
given_lc_get_out_of_transit_points
(time, flux, err_flux, blsfit_savpath=None, trapfit_savpath=None, in_out_transit_savpath=None, sigclip=None, magsarefluxes=True, nworkers=1, extra_maskfrac=0.03)[source]¶ This gets the outoftransit light curve points.
Relevant during iterative masking of transits for multiple planet system search.
Parameters:  time,flux,err_flux (np.array) – The input flux timeseries measurements and their associated measurement errors
 blsfit_savpath (str or None) – If provided as a str, indicates the path of the fit plot to make for a simple BLS model fit to the transit using the obtained period and epoch.
 trapfit_savpath (str or None) – If provided as a str, indicates the path of the fit plot to make for a trapezoidal transit model fit to the transit using the obtained period and epoch.
 in_out_transit_savpath (str or None) – If provided as a str, indicates the path of the plot file that will be made for a plot showing the intransit points and outoftransit points tagged separately.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 magsarefluxes (bool) – This is by default True for this function, since it works on fluxes only at the moment.
 nworkers (int) – The number of parallel BLS periodfinder workers to use.
 extra_maskfrac (float) –
This is the separation (N) from intransit points you desire, in units of the transit duration. extra_maskfrac = 0 if you just want points inside transit, otherwise:
t_starts = t_Is  N*tdur, t_ends = t_IVs + N*tdur
Thus setting N=0.03 masks slightly more than the guessed transit duration.
Returns: (times_oot, fluxes_oot, errs_oot) – The times, flux, err_flux values from the input at the time values outoftransit are returned.
Return type: tuple of np.array
astrobase.varbase.trends module¶
Contains light curve trendremoval tools, such as external parameter decorrelation (EPD) and smoothing.

astrobase.varbase.trends.
smooth_magseries_ndimage_medfilt
(mags, windowsize)[source]¶ This smooths the magseries with a median filter that reflects the array at the boundary.
See https://docs.scipy.org/doc/scipy/reference/tutorial/ndimage.html for details.
Parameters:  mags (np.array) – The input mags/flux timeseries to smooth.
 windowsize (int) – This is a odd integer containing the smoothing window size.
Returns: The smoothed mag/flux timeseries array.
Return type: np.array

astrobase.varbase.trends.
smooth_magseries_signal_medfilt
(mags, windowsize)[source]¶ This smooths the magseries with a simple median filter.
This function pads with zeros near the boundary, see:
https://stackoverflow.com/questions/24585706/scipymedfiltwrongresult
Typically this is bad.
Parameters:  mags (np.array) – The input mags/flux timeseries to smooth.
 windowsize (int) – This is a odd integer containing the smoothing window size.
Returns: The smoothed mag/flux timeseries array.
Return type: np.array

astrobase.varbase.trends.
smooth_magseries_gaussfilt
(mags, windowsize, windowfwhm=7)[source]¶ This smooths the magseries with a Gaussian kernel.
Parameters:  mags (np.array) – The input mags/flux timeseries to smooth.
 windowsize (int) – This is a odd integer containing the smoothing window size.
 windowfwhm (int) – This is an odd integer containing the FWHM of the applied Gaussian window function.
Returns: The smoothed mag/flux timeseries array.
Return type: np.array

astrobase.varbase.trends.
smooth_magseries_savgol
(mags, windowsize, polyorder=2)[source]¶ This smooths the magseries with a SavitskyGolay filter.
Parameters:  mags (np.array) – The input mags/flux timeseries to smooth.
 windowsize (int) – This is a odd integer containing the smoothing window size.
 polyorder (int) – This is an integer containing the polynomial degree order to use when generating the SavitskyGolay filter.
Returns: The smoothed mag/flux timeseries array.
Return type: np.array

astrobase.varbase.trends.
epd_magseries
(times, mags, errs, fsv, fdv, fkv, xcc, ycc, bgv, bge, iha, izd, magsarefluxes=False, epdsmooth_sigclip=3.0, epdsmooth_windowsize=21, epdsmooth_func=<function smooth_magseries_savgol>, epdsmooth_extraparams=None)[source]¶ Detrends a magnitude series using External Parameter Decorrelation.
Requires a set of external parameters similar to those present in HAT light curves. At the moment, the HAT lightcurvespecific external parameters are:
 S: the ‘fsv’ column in light curves,
 D: the ‘fdv’ column in light curves,
 K: the ‘fkv’ column in light curves,
 x coords: the ‘xcc’ column in light curves,
 y coords: the ‘ycc’ column in light curves,
 background value: the ‘bgv’ column in light curves,
 background error: the ‘bge’ column in light curves,
 hour angle: the ‘iha’ column in light curves,
 zenith distance: the ‘izd’ column in light curves
S, D, and K are defined as follows:
 S > measure of PSF sharpness (~1/sigma^2 sosmaller S = wider PSF)
 D > measure of PSF ellipticity in xy direction
 K > measure of PSF ellipticity in cross direction
S, D, K are related to the PSF’s variance and covariance, see eqn 3033 in A. Pal’s thesis: https://arxiv.org/abs/0906.3486
NOTE: The errs are completely ignored and returned unchanged (except for sigclip and finite filtering).
Parameters:  times,mags,errs (np.array) – The input mag/flux timeseries to detrend.
 fsv (np.array) – Array containing the external parameter S of the same length as times.
 fdv (np.array) – Array containing the external parameter D of the same length as times.
 fkv (np.array) – Array containing the external parameter K of the same length as times.
 xcc (np.array) – Array containing the external parameter xcoords of the same length as times.
 ycc (np.array) – Array containing the external parameter ycoords of the same length as times.
 bgv (np.array) – Array containing the external parameter background value of the same length as times.
 bge (np.array) – Array containing the external parameter background error of the same length as times.
 iha (np.array) – Array containing the external parameter hour angle of the same length as times.
 izd (np.array) – Array containing the external parameter zenith distance of the same length as times.
 magsarefluxes (bool) – Set this to True if mags actually contains fluxes.
 epdsmooth_sigclip (float or int or sequence of two floats/ints or None) –
This specifies how to sigmaclip the input LC before fitting the EPD function to it.
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 epdsmooth_windowsize (int) – This is the number of LC points to smooth over to generate a smoothed light curve that will be used to fit the EPD function.
 epdsmooth_func (Python function) –
This sets the smoothing filter function to use. A SavitskyGolay filter is used to smooth the light curve by default. The functions that can be used with this kwarg are listed in varbase.trends. If you want to use your own function, it MUST have the following signature:
def smoothfunc(mags_array, window_size, **extraparams)
and return a numpy array of the same size as mags_array with the smoothed timeseries. Any extra params can be provided using the extraparams dict.
 epdsmooth_extraparams (dict) – This is a dict of any extra filter params to supply to the smoothing function.
Returns: Returns a dict of the following form:
{'times':the input times after nonfinite elems removed, 'mags':the EPD detrended mag values (the EPD mags), 'errs':the errs after nonfinite elems removed, 'fitcoeffs':EPD fit coefficient values, 'fitinfo':the full tuple returned by scipy.leastsq, 'fitmags':the EPD fit function evaluated at times, 'mags_median': this is median of the EPD mags, 'mags_mad': this is the MAD of EPD mags}
Return type: dict

astrobase.varbase.trends.
epd_magseries_extparams
(times, mags, errs, externalparam_arrs, initial_coeff_guess, magsarefluxes=False, epdsmooth_sigclip=3.0, epdsmooth_windowsize=21, epdsmooth_func=<function smooth_magseries_savgol>, epdsmooth_extraparams=None, objective_func=<function _epd_residual2>, objective_kwargs=None, optimizer_func=<function least_squares>, optimizer_kwargs=None)[source]¶ This does EPD on a magseries with arbitrary external parameters.
Parameters:  times,mags,errs (np.array) – The input mag/flux timeseries to run EPD on.
 externalparam_arrs (list of np.arrays) – This is a list of ndarrays of external parameters to decorrelate against. These should all be the same size as times, mags, errs.
 initial_coeff_guess (np.array) – An array of initial fit coefficients to pass into the objective function.
 epdsmooth_sigclip (float or int or sequence of two floats/ints or None) –
This specifies how to sigmaclip the input LC before smoothing it and fitting the EPD function to it. The actual LC will not be sigmaclipped.
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 epdsmooth_windowsize (int) – This is the number of LC points to smooth over to generate a smoothed light curve that will be used to fit the EPD function.
 epdsmooth_func (Python function) –
This sets the smoothing filter function to use. A SavitskyGolay filter is used to smooth the light curve by default. The functions that can be used with this kwarg are listed in varbase.trends. If you want to use your own function, it MUST have the following signature:
def smoothfunc(mags_array, window_size, **extraparams)
and return a numpy array of the same size as mags_array with the smoothed timeseries. Any extra params can be provided using the extraparams dict.
 epdsmooth_extraparams (dict) – This is a dict of any extra filter params to supply to the smoothing function.
 objective_func (Python function) –
The function that calculates residuals between the model and the smoothed magseries. This must have the following signature:
def objective_func(fit_coeffs, times, mags, errs, *external_params, **objective_kwargs)
where times, mags, errs are arrays of the sigmaclipped and smoothed timeseries, fit_coeffs is an array of EPD fit coefficients, external_params is a tuple of the passed in external parameter arrays, and objective_kwargs is a dict of any optional kwargs to pass into the objective function.
This should return the value of the residual based on evaluating the model function (and any weights based on errs or times).
 objective_kwargs (dict or None) – A dict of kwargs to pass into the objective_func function.
 optimizer_func (Python function) –
The function that minimizes the residual between the model and the smoothed magseries using the objective_func. This should have a signature similar to one of the optimizer functions in scipy.optimize, i.e.:
def optimizer_func(objective_func, initial_coeffs, args=(), ...)
and return a scipy.optimize.OptimizeResult. We’ll rely on the
.success
attribute to determine if the EPD fit was successful, and the.x
attribute to get the values of the fit coefficients.  optimizer_kwargs (dict or None) – A dict of kwargs to pass into the optimizer_func function.
Returns: Returns a dict of the following form:
{'times':the input times after nonfinite elems removed, 'mags':the EPD detrended mag values (the EPD mags), 'errs':the errs after nonfinite elems removed, 'fitcoeffs':EPD fit coefficient values, 'fitinfo':the result returned by the optimizer function, 'mags_median': this is the median of the EPD mags, 'mags_mad': this is the MAD of EPD mags}
Return type: dict

astrobase.varbase.trends.
rfepd_magseries
(times, mags, errs, externalparam_arrs, magsarefluxes=False, epdsmooth=True, epdsmooth_sigclip=3.0, epdsmooth_windowsize=21, epdsmooth_func=<function smooth_magseries_savgol>, epdsmooth_extraparams=None, rf_subsample=1.0, rf_ntrees=300, rf_extraparams={'criterion': 'mse', 'n_jobs': 1, 'oob_score': False})[source]¶ This uses a RandomForestRegressor to decorrelate the given magseries.
Parameters:  times,mags,errs (np.array) – The input mag/flux timeseries to run EPD on.
 externalparam_arrs (list of np.arrays) – This is a list of ndarrays of external parameters to decorrelate against. These should all be the same size as times, mags, errs.
 epdsmooth (bool) – If True, sets the training LC for the RandomForestRegress to be a smoothed version of the sigmaclipped light curve provided in times, mags, errs.
 epdsmooth_sigclip (float or int or sequence of two floats/ints or None) –
This specifies how to sigmaclip the input LC before smoothing it and fitting the EPD function to it. The actual LC will not be sigmaclipped.
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 epdsmooth_windowsize (int) – This is the number of LC points to smooth over to generate a smoothed light curve that will be used to fit the EPD function.
 epdsmooth_func (Python function) –
This sets the smoothing filter function to use. A SavitskyGolay filter is used to smooth the light curve by default. The functions that can be used with this kwarg are listed in varbase.trends. If you want to use your own function, it MUST have the following signature:
def smoothfunc(mags_array, window_size, **extraparams)
and return a numpy array of the same size as mags_array with the smoothed timeseries. Any extra params can be provided using the extraparams dict.
 epdsmooth_extraparams (dict) – This is a dict of any extra filter params to supply to the smoothing function.
 rf_subsample (float) – Defines the fraction of the size of the mags array to use for training the random forest regressor.
 rf_ntrees (int) – This is the number of trees to use for the RandomForestRegressor.
 rf_extraprams (dict) – This is a dict of any extra kwargs to provide to the RandomForestRegressor instance used.
Returns: Returns a dict with decorrelated mags and the usual info from the RandomForestRegressor: variable importances, etc.
Return type: dict
astrobase.plotbase module¶
Contains various useful functions for plotting light curves and associated data.

astrobase.plotbase.
plot_magseries
(times, mags, magsarefluxes=False, errs=None, out=None, sigclip=30.0, normto='globalmedian', normmingap=4.0, timebin=None, yrange=None, segmentmingap=100.0, plotdpi=100)[source]¶ This plots a magnitude/flux timeseries.
Parameters:  times,mags (np.array) – The mag/flux timeseries to plot as a function of time.
 magsarefluxes (bool) –
Indicates if the input mags array is actually an array of flux measurements instead of magnitude measurements. If this is set to True, then the plot yaxis will be set as appropriate for mag or fluxes. In addition:
 if normto is ‘zero’, then the median flux is divided from each observation’s flux value to yield normalized fluxes with 1.0 as the global median.
 if normto is ‘globalmedian’, then the global median flux value across the entire time series is multiplied with each measurement.
 if norm is set to a float, then this number is multiplied with the flux value for each measurement.
 errs (np.array or None) – If this is provided, contains the measurement errors associated with each measurement of flux/mag in timeseries. Providing this kwarg will add errbars to the output plot.
 out (str or StringIO/BytesIO object or None) –
Sets the output type and target:
 If out is a string, will save the plot to the specified file name.
 If out is a StringIO/BytesIO object, will save the plot to that file handle. This can be useful to carry out additional operations on the output binary stream, or convert it to base64 text for embedding in HTML pages.
 If out is None, will save the plot to a file called ‘magseriesplot.png’ in the current working directory.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 normto ({'globalmedian', 'zero'} or a float) –
Sets the normalization target:
'globalmedian' > norms each mag to the global median of the LC column 'zero' > norms each mag to zero a float > norms each mag to this specified float value.
 normmingap (float) – This defines how much the difference between consecutive measurements is allowed to be to consider them as parts of different timegroups. By default it is set to 4.0 days.
 timebin (float or None) – The bin size to use to group together measurements closer than this amount in time. This is in seconds. If this is None, no timebinning will be performed.
 yrange (list of two floats or None) – This is used to provide a custom yaxis range to the plot. If None, will automatically determine yaxis range.
 segmentmingap (float or None) – This controls the minimum length of time (in days) required to consider a timegroup in the light curve as a separate segment. This is useful when the light curve consists of measurements taken over several seasons, so there’s lots of dead space in the plot that can be cut out to zoom in on the interesting stuff. If segmentmingap is not None, the magseries plot will be cut in this way and the xaxis will show these breaks.
 plotdpi (int) – Sets the resolution in DPI for PNG plots (default = 100).
Returns: Returns based on the input:
 If out is a str or None, the path to the generated plot file is returned.
 If out is a StringIO/BytesIO object, will return the StringIO/BytesIO object to which the plot was written.
Return type: str or BytesIO/StringIO object

astrobase.plotbase.
plot_phased_magseries
(times, mags, period, epoch='min', fitknotfrac=0.01, errs=None, magsarefluxes=False, normto='globalmedian', normmingap=4.0, sigclip=30.0, phasewrap=True, phasesort=True, phasebin=None, plotphaselim=(0.8, 0.8), yrange=None, xtimenotphase=False, xaxlabel='phase', yaxlabel=None, modelmags=None, modeltimes=None, modelerrs=None, outfile=None, plotdpi=100)[source]¶ Plots a phased magnitude/flux timeseries using the period provided.
Parameters:  times,mags (np.array) – The mag/flux timeseries to plot as a function of phase given period.
 period (float) – The period to use to phasefold the timeseries. Should be the same unit as times (usually in days)
 epoch ('min' or float or None) –
This indicates how to get the epoch to use for phasing the light curve:
 If None, uses the min(times) as the epoch for phasing.
 If epoch is the string ‘min’, then fits a cubic spline to the phased light curve using min(times) as the initial epoch, finds the magnitude/flux minimum of this phased light curve fit, and finally uses the that time value as the epoch. This is useful for plotting planetary transits and eclipsing binary phased light curves so that phase 0.0 corresponds to the midcenter time of primary eclipse (or transit).
 If epoch is a float, then uses that directly to phase the light curve and as the epoch of the phased mag series plot.
 fitknotfrac (float) – If epoch=’min’, this function will attempt to fit a cubic spline to the phased light curve to find a time of light minimum as phase 0.0. This kwarg sets the number of knots to generate the spline as a fraction of the total number of measurements in the input timeseries. By default, this is set so that 100 knots are used to generate a spline for fitting the phased light curve consisting of 10000 measurements.
 errs (np.array or None) – If this is provided, contains the measurement errors associated with each measurement of flux/mag in timeseries. Providing this kwarg will add errbars to the output plot.
 magsarefluxes (bool) – Indicates if the input mags array is actually an array of flux measurements instead of magnitude measurements. If this is set to True, then the plot yaxis will be set as appropriate for mag or fluxes.
 normto ({'globalmedian', 'zero'} or a float) –
Sets the normalization target:
'globalmedian' > norms each mag to the global median of the LC column 'zero' > norms each mag to zero a float > norms each mag to this specified float value.
 normmingap (float) – This defines how much the difference between consecutive measurements is allowed to be to consider them as parts of different timegroups. By default it is set to 4.0 days.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 phasewrap (bool) – If this is True, the phased timeseries will be wrapped around phase 0.0.
 phasesort (bool) – If this is True, the phased timeseries will be sorted in phase.
 phasebin (float or None) – If this is provided, indicates the bin size to use to group together measurements closer than this amount in phase. This is in units of phase. The binned phased light curve will be overplotted on top of the phased light curve. Useful for when one has many measurement points and needs to pick out a small trend in an otherwise noisy phased light curve.
 plotphaselim (sequence of two floats or None) – The xaxis limits to use when making the phased light curve plot. By default, this is (0.8, 0.8), which places phase 0.0 at the center of the plot and covers approximately two cycles in phase to make any trends clear.
 yrange (list of two floats or None) – This is used to provide a custom yaxis range to the plot. If None, will automatically determine yaxis range.
 xtimenotphase (bool) – If True, the xaxis gets units of time (multiplies phase by period).
 xaxlabel (str) – Sets the label for the xaxis.
 yaxlabel (str or None) – Sets the label for the yaxis. If this is None, the appropriate label will be used based on the value of the magsarefluxes kwarg.
 modeltimes,modelmags,modelerrs (np.array or None) – If all of these are provided, then this function will overplot the values of modeltimes and modelmags on top of the actual phased light curve. This is useful for plotting variability models on top of the light curve (e.g. plotting a MandelAgol transit model over the actual phased light curve. These arrays will be phased using the already provided period and epoch.
 outfile (str or StringIO/BytesIO or matplotlib.axes.Axes or None) –
 a string filename for the file where the plot will be written.
 a StringIO/BytesIO object to where the plot will be written.
 a matplotlib.axes.Axes object to where the plot will be written.
 if None, plots to ‘magseriesphasedplot.png’ in current dir.
 plotdpi (int) – Sets the resolution in DPI for PNG plots (default = 100).
Returns: This returns based on the input:
 If outfile is a str or None, the path to the generated plot file is returned.
 If outfile is a StringIO/BytesIO object, will return the StringIO/BytesIO object to which the plot was written.
 If outfile is a matplotlib.axes.Axes object, will return the Axes object with the plot elements added to it. One can then directly include this Axes object in some other Figure.
Return type: str or StringIO/BytesIO or matplotlib.axes.Axes

astrobase.plotbase.
skyview_stamp
(ra, decl, survey='DSS2 Red', scaling='Linear', sizepix=300, flip=True, convolvewith=None, forcefetch=False, cachedir='~/.astrobase/stampcache', timeout=45.0, retry_failed=False, savewcsheader=True, verbose=False)[source]¶ This downloads a DSS FITS stamp centered on the coordinates specified.
This wraps the function
astrobase.services.skyview.get_stamp()
, which downloads Digitized Sky Survey stamps in FITS format from the NASA SkyView service:https://skyview.gsfc.nasa.gov/current/cgi/query.pl
Also adds some useful operations on top of the FITS file returned.
Parameters:  ra,decl (float) – The center coordinates for the stamp in decimal degrees.
 survey (str) – The survey name to get the stamp from. This is one of the values in the ‘SkyView Surveys’ option boxes on the SkyView webpage. Currently, we’ve only tested using ‘DSS2 Red’ as the value for this kwarg, but the other ones should work in principle.
 scaling (str) – This is the pixel value scaling function to use. Can be any of the strings (“Log”, “Linear”, “Sqrt”, “HistEq”).
 sizepix (int) – Size of the requested stamp, in pixels. (DSS scale is ~1arcsec/px).
 flip (bool) – Will flip the downloaded image top to bottom. This should usually be True because matplotlib and FITS have different image coord origin conventions. Alternatively, set this to False and use the origin=’lower’ in any call to matplotlib.pyplot.imshow when plotting this image.
 convolvewith (astropy.convolution Kernel object or None) –
If convolvewith is an astropy.convolution Kernel object from:
http://docs.astropy.org/en/stable/convolution/kernels.html
then, this function will return the stamp convolved with that kernel. This can be useful to see effects of widefield telescopes (like the HATNet and HATSouth lenses) degrading the nominal 1 arcsec/px of DSS, causing blending of targets and any variability.
 forcefetch (bool) – If True, will disregard any existing cached copies of the stamp already downloaded corresponding to the requested center coordinates and redownload the FITS from the SkyView service.
 cachedir (str) – This is the path to the astrobase cache directory. All downloaded FITS stamps are stored here as .fits.gz files so we can immediately respond with the cached copy when a request is made for a coordinate center that’s already been downloaded.
 timeout (float) – Sets the timeout in seconds to wait for a response from the NASA SkyView service.
 retry_failed (bool) – If the initial request to SkyView fails, and this is True, will retry until it succeeds.
 savewcsheader (bool) – If this is True, also returns the WCS header of the downloaded FITS stamp in addition to the FITS image itself. Useful for projecting object coordinates onto image xy coordinates for visualization.
 verbose (bool) – If True, indicates progress.
Returns: This returns based on the value of savewcsheader:
 If savewcsheader=True, returns a tuple: (FITS stamp image as a numpy array, FITS header)
 If savewcsheader=False, returns only the FITS stamp image as numpy array.
 If the stamp retrieval fails, returns None.
Return type: tuple or array or None

astrobase.plotbase.
fits_finder_chart
(fitsfile, outfile, fitsext=0, wcsfrom=None, scale=<astropy.visualization.interval.ZScaleInterval object>, stretch=<astropy.visualization.stretch.LinearStretch object>, colormap=<matplotlib.colors.LinearSegmentedColormap object>, findersize=None, finder_coordlimits=None, overlay_ra=None, overlay_decl=None, overlay_pltopts={'marker': 'o', 'markeredgecolor': 'red', 'markeredgewidth': 2.0, 'markerfacecolor': 'none', 'markersize': 10.0}, overlay_zoomcontain=False, grid=False, gridcolor='k')[source]¶ This makes a finder chart for a given FITS with an optional object position overlay.
Parameters:  fitsfile (str) – fitsfile is the FITS file to use to make the finder chart.
 outfile (str) – outfile is the name of the output file. This can be a png or pdf or whatever else matplotlib can write given a filename and extension.
 fitsext (int) – Sets the FITS extension in fitsfile to use to extract the image array from.
 wcsfrom (str or None) – If wcsfrom is None, the WCS to transform the RA/Dec to pixel x/y will be taken from the FITS header of fitsfile. If this is not None, it must be a FITS or similar file that contains a WCS header in its first extension.
 scale (astropy.visualization.Interval object) – scale sets the normalization for the FITS pixel values. This is an astropy.visualization Interval object. See http://docs.astropy.org/en/stable/visualization/normalization.html for details on scale and stretch objects.
 stretch (astropy.visualization.Stretch object) – stretch sets the stretch function for mapping FITS pixel values to output pixel values. This is an astropy.visualization Stretch object. See http://docs.astropy.org/en/stable/visualization/normalization.html for details on scale and stretch objects.
 colormap (matplotlib Colormap object) – colormap is a matplotlib color map object to use for the output image.
 findersize (None or tuple of two ints) – If findersize is None, the output image size will be set by the NAXIS1 and NAXIS2 keywords in the input fitsfile FITS header. Otherwise, findersize must be a tuple with the intended x and y size of the image in inches (all output images will use a DPI = 100).
 finder_coordlimits (list of four floats or None) – If not None, finder_coordlimits sets x and y limits for the plot, effectively zooming it in if these are smaller than the dimensions of the FITS image. This should be a list of the form: [minra, maxra, mindecl, maxdecl] all in decimal degrees.
 overlay_decl (overlay_ra,) – overlay_ra and overlay_decl are ndarrays containing the RA and Dec values to overplot on the image as an overlay. If these are both None, then no overlay will be plotted.
 overlay_pltopts (dict) – overlay_pltopts controls how the overlay points will be plotted. This a dict with standard matplotlib marker, etc. kwargs as keyval pairs, e.g. ‘markersize’, ‘markerfacecolor’, etc. The default options make red outline circles at the location of each object in the overlay.
 overlay_zoomcontain (bool) – overlay_zoomcontain controls if the finder chart will be zoomed to just contain the overlayed points. Everything outside the footprint of these points will be discarded.
 grid (bool) – grid sets if a grid will be made on the output image.
 gridcolor (str) – gridcolor sets the color of the grid lines. This is a usual matplotib color spec string.
Returns: The filename of the generated output image if successful. None otherwise.
Return type: str or None

astrobase.plotbase.
plot_periodbase_lsp
(lspinfo, outfile=None, plotdpi=100)[source]¶ Makes a plot of periodograms obtained from periodbase functions.
This takes the output dict produced by any astrobase.periodbase periodfinder function or a pickle filename containing such a dict and makes a periodogram plot.
Parameters:  lspinfo (dict or str) –
If lspinfo is a dict, it must be a dict produced by an astrobase.periodbase periodfinder function or a dict from your own periodfinder function or routine that is of the form below with at least these keys:
{'periods': np.array of all periods searched by the periodfinder, 'lspvals': np.array of periodogram power value for each period, 'bestperiod': a float value that is the period with the highest peak in the periodogram, i.e. the mostlikely actual period, 'method': a threeletter code naming the periodfinder used; must be one of the keys in the `METHODLABELS` dict above, 'nbestperiods': a list of the periods corresponding to periodogram peaks (`nbestlspvals` below) to annotate on the periodogram plot so they can be called out visually, 'nbestlspvals': a list of the power values associated with periodogram peaks to annotate on the periodogram plot so they can be called out visually; should be the same length as `nbestperiods` above}
If lspinfo is a str, then it must be a path to a pickle file that contains a dict of the form described above.
 outfile (str or None) – If this is a str, will write the periodogram plot to the file specified by this string. If this is None, will write to a file called ‘lspplot.png’ in the current working directory.
 plotdpi (int) – Sets the resolution in DPI of the output periodogram plot PNG file.
Returns: Absolute path to the periodogram plot file created.
Return type: str
 lspinfo (dict or str) –
astrobase.lcproc package¶
This package contains functions that help drive large batchprocessing jobs for light curves.
This top level module contains functions to import custom light curve formats. Once you have your own LC format registered with lcproc, all of the submodules in this package can be used to process these LCs:
astrobase.lcproc.awsrun
: contains driver functions that run batchprocessing of light curve periodfinding and checkplot making using resources from Amazon AWS: EC2 for processing, S3 for storage, and SQS for queuing work.astrobase.lcproc.catalogs
: contains functions that generate catalogs from collections of light curves, make KDTrees for fast spatial matching, and augment these catalogs from the rich object information contained in checkplot pickles.astrobase.lcproc.checkplotgen
: contains functions that drive batchjobs to make checkplot pickles for a large collection of light curves (and optional periodfinding results).astrobase.lcproc.checkplotproc
: contains functions that add extra information to checkplot pickles, including colormagnitude diagrams, updating neighbor light curves, and crossmatches to external catalogs.astrobase.lcproc.epd
: contains functions that drive batchjobs for External Parameter Decorrelation on collections of light curves.astrobase.lcproc.lcbin
: contains functions that drive batchjobs for timebinning collections of light curves to a specified cadence.astrobase.lcproc.lcpfeatures
: contains functions that drive batchjobs to calculate features of phased light curves, if periodfinding results for these are available. These periodic light curve features can be used later to do variable star classification.astrobase.lcproc.lcsfeatures
: contains functions that drive batchjobs to calculate color, coordinate, and neighbor proximity features for a collection of light curves. These can be used later to do variable star classification.astrobase.lcproc.lcvfeatures
: contains functions that drive batchjobs to calculate nonperiodic features of unphased light curves (e.g. timeseries moments and variability indices). These can be used later to do variable star classification.astrobase.lcproc.periodsearch
: contains functions that drive batchjobs to run periodfinding using any of the methods inastrobase.periodbase
on collections of light curves. These produce periodfinder result pickles that can be used transparently by the functions inastrobase.lcproc.checkplotgen
andastrobase.lcproc.checkplotproc
to generate and update checkplot pickles.astrobase.lcproc.tfa
: contains functions that drive the application of the Trend Filtering Algorithm (TFA) to large collections of light curves.astrobase.lcproc.varthreshold
: contains functions that help decide where to place thresholds on several variability indices for a collection of light curves to maximize recovery of actual variable stars.

astrobase.lcproc.
register_lcformat
(formatkey, fileglob, timecols, magcols, errcols, readerfunc_module, readerfunc, readerfunc_kwargs=None, normfunc_module=None, normfunc=None, normfunc_kwargs=None, magsarefluxes=False, overwrite_existing=False, lcformat_dir='~/.astrobase/lcformatjsons')[source]¶ This adds a new LC format to the astrobase LC format registry.
Allows handling of custom format light curves for astrobase lcproc drivers. Once the format is successfully registered, light curves should work transparently with all of the functions in this module, by simply calling them with the formatkey in the lcformat keyword argument.
LC format specifications are generated as JSON files. astrobase comes with several of these in <astrobase install path>/data/lcformats. LC formats you add by using this function will have their specifiers written to the ~/.astrobase/lcformatjsons directory in your home directory.
Parameters:  formatkey (str) – A str used as the unique ID of this LC format for all lcproc functions and can be used to look it up later and import the correct functions needed to support it for lcproc operations. For example, we use ‘kepfits’ as a the specifier for Kepler FITS light curves, which can be read by the astrobase.astrokep.read_kepler_fitslc function as specified by the <astrobase install path>/data/lcformats/kepfits.json LC format specification JSON produced by register_lcformat.
 fileglob (str) – The default UNIX fileglob to use to search for light curve files in this LC format. This is a string like ‘whatever???.*??.lc’.
 timecols,magcols,errcols (list of str) –
These are all lists of strings indicating which keys in the lcdict produced by your lcreader_func that will be extracted and used by lcproc functions for processing. The lists must all have the same dimensions, e.g. if timecols = [‘timecol1’,’timecol2’], then magcols must be something like [‘magcol1’,’magcol2’] and errcols must be something like [‘errcol1’, ‘errcol2’]. This allows you to process multiple apertures or multiple types of measurements in one go.
Each element in these lists can be a simple key, e.g. ‘time’ (which would correspond to lcdict[‘time’]), or a composite key, e.g. ‘aperture1.times.rjd’ (which would correspond to lcdict[‘aperture1’][‘times’][‘rjd’]). See the examples in the lcformat specification JSON files in <astrobase install path>/data/lcformats.
 readerfunc_module (str) –
This is either:
 a Python module import path, e.g. ‘astrobase.lcproc.catalogs’ or
 a path to a Python file, e.g. ‘/astrobase/hatsurveys/hatlc.py’
that contains the Python module that contains functions used to open (and optionally normalize) a custom LC format that’s not natively supported by astrobase.
 readerfunc (str) –
This is the function name in readerfunc_module to use to read light curves in the custom format. This MUST always return a dictionary (the ‘lcdict’) with the following signature (the keys listed below are required, but others are allowed):
{'objectid': this object's identifier as a string, 'objectinfo':{'ra': this object's right ascension in decimal deg, 'decl': this object's declination in decimal deg, 'ndet': the number of observations in this LC, 'objectid': the object ID again for legacy reasons}, ...other time columns, mag columns go in as their own keys}
 normfunc_kwargs (dict or None) – This is a dictionary containing any kwargs to pass through to the light curve norm function.
 normfunc_module (str or None) –
This is either:
 a Python module import path, e.g. ‘astrobase.lcproc.catalogs’ or
 a path to a Python file, e.g. ‘/astrobase/hatsurveys/hatlc.py’
 None, in which case we’ll use default normalization
that contains the Python module that contains functions used to normalize a custom LC format that’s not natively supported by astrobase.
 normfunc (str or None) –
This is the function name in normfunc_module to use to normalize light curves in the custom format. If None, the default normalization method used by lcproc is to find gaps in the timeseries, normalize measurements grouped by these gaps to zero, then normalize the entire magnitude time series to global time series median using the astrobase.lcmath.normalize_magseries function.
If this is provided, the normalization function should take and return an lcdict of the same form as that produced by readerfunc above. For an example of a specific normalization function, see normalize_lcdict_by_inst in the astrobase.hatsurveys.hatlc module.
 normfunc_kwargs – This is a dictionary containing any kwargs to pass through to the light curve normalization function.
 magsarefluxes (bool) – If this is True, then all lcproc functions will treat the measurement columns in the lcdict produced by your readerfunc as flux instead of mags, so things like default normalization and sigmaclipping will be done correctly. If this is False, magnitudes will be treated as magnitudes.
 overwrite_existing (bool) – If this is True, this function will overwrite any existing LC format specification JSON with the same name as that provided in the formatkey arg. This can be used to update LC format specifications while keeping the formatkey the same.
 lcformat_dir (str) – This specifies the directory where the the LC format specification JSON produced by this function will be written. By default, this goes to the .astrobase/lcformatjsons directory in your home directory.
Returns: Returns the file path to the generated LC format specification JSON file.
Return type: str

astrobase.lcproc.
get_lcformat
(formatkey, use_lcformat_dir=None)[source]¶ This loads an LC format description from a previouslysaved JSON file.
Parameters:  formatkey (str) – The key used to refer to the LC format. This is part of the JSON file’s name, e.g. the format key ‘hatcsv’ maps to the format JSON file: ‘<astrobase install path>/data/lcformats/hatcsv.json’.
 use_lcformat_dir (str or None) –
If provided, must be the path to a directory that contains the corresponding lcformat JSON file for formatkey. If this is None, this function will look for lcformat JSON files corresponding to the given formatkey:
 first, in the directory specified in this kwarg,
 if not found there, in the home directory: ~/.astrobase/lcformatjsons
 if not found there, in: <astrobase install path>/data/lcformats
Returns: A tuple of the following form is returned:
(fileglob : the file glob of the associated LC files, readerfunc_in : the imported Python function for reading LCs, timecols : list of time col keys to get from the lcdict, magcols : list of mag col keys to get from the lcdict , errcols : list of err col keys to get from the lcdict, magsarefluxes : True if the measurements are fluxes not mags, normfunc_in : the imported Python function for normalizing LCs)
All astrobase.lcproc functions can then use this tuple to dynamically import your LC reader and normalization functions to work with your LC format transparently.
Return type: tuple
Submodules¶
astrobase.lcproc.awsrun module¶
This contains lcproc worker loops useful for AWS processing of light curves.
The basic workflow is:
LCs from S3 > SQS > worker loop > products back to S3  result JSON to SQS
All functions here assume AWS credentials have been loaded already using awscli as described at:
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html
General recommendations:
 use t3.medium or t3.micro instances for runcp_consumer_loop. Checkplot making isn’t really a CPU intensive activity, so using these will be cheaper.
 use c5.2xlarge or above instances for runpf_consumer_loop. Periodfinders require a decent number of fast cores, so a spot fleet of these instances should be costeffective.
 you may want a t3.micro instance running in the same region and VPC as your worker node instances to serve as a head node driving the producer_loop functions. This can be done from a machine outside AWS, but you’ll incur (probably tiny) charges for network egress from the output queues.
 It’s best not to download results from S3 as soon as they’re produced. Leave them on S3 until everything is done, then use rclone (https://rclone.org) to sync them back to your machines using –transfers <large number>.
The user_data and instance_user_data kwargs for the make_ec2_nodes and make_spot_fleet_cluster functions can be used to start processing loops as soon as EC2 brings up the VM instance. This is especially useful for spot fleets set to maintain a target capacity, since worker nodes will be terminated and automatically replaced. Bringing up the processing loop at instance start up makes it easy to continue processing light curves exactly where you left off without having to manually intervene.
Example script for user_data bringing up a checkplotmaking loop on instance creation (assuming we’re using Amazon Linux 2):
#!/bin/bash
cat << 'EOF' > /home/ec2user/launchruncp.sh
#!/bin/bash
sudo yum y install python3devel gccgfortran jq htop emacsnox git
# create the virtualenv
python3 m venv /home/ec2user/py3
# get astrobase
cd /home/ec2user
git clone https://github.com/waqasbhatti/astrobase
# install it
cd /home/ec2user/astrobase
/home/ec2user/py3/bin/pip install pip setuptools numpy U
/home/ec2user/py3/bin/pip install e .[aws]
# make the work dir
mkdir /home/ec2user/work
cd /home/ec2user/work
# wait a bit for the instance info to be populated
sleep 5
# set some environ vars for boto3 and the processing loop
export AWS_DEFAULT_REGION=`curl silent http://169.254.169.254/latest/dynamic/instanceidentity/document/  jq '.region'  tr d '"'`
export NCPUS=`lscpu J  jq ".lscpu[3].datatonumber"`
# launch the processor loops
for s in `seq $NCPUS`; do nohup /home/ec2user/py3/bin/python3 u c "from astrobase.lcproc import awsrun as lcp; lcp.runcp_consumer_loop('https://queueurl','.','s3://path/to/lclist.pkl')" > runcp$sloop.out & done
EOF
# run the script we just created as ec2user
chown ec2user /home/ec2user/launchruncp.sh
su ec2user c 'bash /home/ec2user/launchruncp.sh'
Here’s a similar script for a runpf consumer loop. We launch only a single instance of the loop because runpf will use all CPUs by default for its periodfinder parallelized functions:
#!/bin/bash
cat << 'EOF' > /home/ec2user/launchrunpf.sh
#!/bin/bash
sudo yum y install python3devel gccgfortran jq htop emacsnox git
python3 m venv /home/ec2user/py3
cd /home/ec2user
git clone https://github.com/waqasbhatti/astrobase
cd /home/ec2user/astrobase
/home/ec2user/py3/bin/pip install pip setuptools numpy U
/home/ec2user/py3/bin/pip install e .[aws]
mkdir /home/ec2user/work
cd /home/ec2user/work
# wait a bit for the instance info to be populated
sleep 5
export AWS_DEFAULT_REGION=`curl silent http://169.254.169.254/latest/dynamic/instanceidentity/document/  jq '.region'  tr d '"'`
export NCPUS=`lscpu J  jq ".lscpu[3].datatonumber"`
# launch the processes
nohup /home/ec2user/py3/bin/python3 u c "from astrobase.lcproc import awsrun as lcp; lcp.runpf_consumer_loop('https://inputqueueurl','.')" > runpfloop.out &
EOF
chown ec2user /home/ec2user/launchrunpf.sh
su ec2user c 'bash /home/ec2user/launchrunpf.sh'

astrobase.lcproc.awsrun.
kill_handler
(sig, frame)[source]¶ This raises a KeyboardInterrupt when a SIGKILL comes in.
This is a handle for use with the Python signal.signal function.

astrobase.lcproc.awsrun.
cache_clean_handler
(min_age_hours=1)[source]¶ This periodically cleans up the ~/.astrobase cache to save us from diskspace doom.
Parameters: min_age_hours (int) – Files older than this number of hours from the current time will be deleted. Returns: Return type: Nothing.

astrobase.lcproc.awsrun.
shutdown_check_handler
()[source]¶ This checks the AWS instance data URL to see if there’s a pending shutdown for the instance.
This is useful for AWS spot instances. If there is a pending shutdown posted to the instance data URL, we’ll use the result of this function break out of the processing loop and shut everything down ASAP before the instance dies.
Returns:  True if the instance is going to die soon.
 False if the instance is still safe.
Return type: bool

astrobase.lcproc.awsrun.
runcp_producer_loop
(lightcurve_list, input_queue, input_bucket, result_queue, result_bucket, pfresult_list=None, runcp_kwargs=None, process_list_slice=None, purge_queues_when_done=False, delete_queues_when_done=False, download_when_done=True, save_state_when_done=True, s3_client=None, sqs_client=None)[source]¶ This sends checkplot making tasks to the input queue and monitors the result queue for task completion.
Parameters:  lightcurve_list (str or list of str) – This is either a string pointing to a file containing a list of light curves filenames to process or the list itself. The names must correspond to the full filenames of files stored on S3, including all prefixes, but not include the ‘s3://<bucket name>/’ bit (these will be added automatically).
 input_queue (str) – This is the name of the SQS queue which will receive processing tasks generated by this function. The queue URL will automatically be obtained from AWS.
 input_bucket (str) – The name of the S3 bucket containing the light curve files to process.
 result_queue (str) – This is the name of the SQS queue that this function will listen to for messages from the workers as they complete processing on their input elements. This function will attempt to match input sent to the input_queue with results coming into the result_queue so it knows how many objects have been successfully processed. If this function receives task results that aren’t in its own input queue, it will acknowledge them so they complete successfully, but not download them automatically. This handles leftover tasks completing from a previous run of this function.
 result_bucket (str) – The name of the S3 bucket which will receive the results from the workers.
 pfresult_list (list of str or None) – This is a list of periodfinder result pickle S3 URLs associated with each light curve. If provided, this will be used to add in phased light curve plots to each checkplot pickle. If this is None, the worker loop will produce checkplot pickles that only contain object information, neighbor information, and unphased light curves.
 runcp_kwargs (dict) – This is a dict used to pass any extra keyword arguments to the lcproc.checkplotgen.runcp function that will be run by the worker loop.
 process_list_slice (list) –
This is used to index into the input light curve list so a subset of the full list can be processed in this specific run of this function.
Use None for a slice index elem to emulate single slice spec behavior:
process_list_slice = [10, None] > lightcurve_list[10:] process_list_slice = [None, 500] > lightcurve_list[:500]
 purge_queues_when_done (bool) – If this is True, and this function exits (either when all done, or when it is interrupted with a Ctrl+C), all outstanding elements in the input/output queues that have not yet been acknowledged by workers or by this function will be purged. This effectively cancels all outstanding work.
 delete_queues_when_done (bool) – If this is True, and this function exits (either when all done, or when it is interrupted with a Ctrl+C’), all outstanding work items will be purged from the input/queues and the queues themselves will be deleted.
 download_when_done (bool) – If this is True, the generated checkplot pickle for each input work item will be downloaded immediately to the current working directory when the worker functions report they’re done with it.
 save_state_when_done (bool) – If this is True, will save the current state of the work item queue and the work items acknowledged as completed to a pickle in the current working directory. Call the runcp_producer_loop_savedstate function below to resume processing from this saved state later.
 s3_client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its S3 download operations. Alternatively, pass in an existing boto3.Client instance to reuse it here.
 sqs_client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its SQS operations. Alternatively, pass in an existing boto3.Client instance to reuse it here.
Returns: Returns the current work state as a dict or str path to the generated work state pickle depending on if save_state_when_done is True.
Return type: dict or str

astrobase.lcproc.awsrun.
runcp_producer_loop_savedstate
(use_saved_state=None, lightcurve_list=None, input_queue=None, input_bucket=None, result_queue=None, result_bucket=None, pfresult_list=None, runcp_kwargs=None, process_list_slice=None, download_when_done=True, purge_queues_when_done=True, save_state_when_done=True, delete_queues_when_done=False, s3_client=None, sqs_client=None)[source]¶ This wraps the function above to allow for loading previous state from a file.
Parameters:  use_saved_state (str or None) – This is the path to the saved state pickle file produced by a previous run of runcp_producer_loop. Will get all of the arguments to run another instance of the loop from that pickle file. If this is None, you MUST provide all of the appropriate arguments to that function.
 lightcurve_list (str or list of str or None) – This is either a string pointing to a file containing a list of light curves filenames to process or the list itself. The names must correspond to the full filenames of files stored on S3, including all prefixes, but not include the ‘s3://<bucket name>/’ bit (these will be added automatically).
 input_queue (str or None) – This is the name of the SQS queue which will receive processing tasks generated by this function. The queue URL will automatically be obtained from AWS.
 input_bucket (str or None) – The name of the S3 bucket containing the light curve files to process.
 result_queue (str or None) – This is the name of the SQS queue that this function will listen to for messages from the workers as they complete processing on their input elements. This function will attempt to match input sent to the input_queue with results coming into the result_queue so it knows how many objects have been successfully processed. If this function receives task results that aren’t in its own input queue, it will acknowledge them so they complete successfully, but not download them automatically. This handles leftover tasks completing from a previous run of this function.
 result_bucket (str or None) – The name of the S3 bucket which will receive the results from the workers.
 pfresult_list (list of str or None) – This is a list of periodfinder result pickle S3 URLs associated with each light curve. If provided, this will be used to add in phased light curve plots to each checkplot pickle. If this is None, the worker loop will produce checkplot pickles that only contain object information, neighbor information, and unphased light curves.
 runcp_kwargs (dict or None) – This is a dict used to pass any extra keyword arguments to the lcproc.checkplotgen.runcp function that will be run by the worker loop.
 process_list_slice (list or None) –
This is used to index into the input light curve list so a subset of the full list can be processed in this specific run of this function.
Use None for a slice index elem to emulate single slice spec behavior:
process_list_slice = [10, None] > lightcurve_list[10:] process_list_slice = [None, 500] > lightcurve_list[:500]
 purge_queues_when_done (bool or None) – If this is True, and this function exits (either when all done, or when it is interrupted with a Ctrl+C), all outstanding elements in the input/output queues that have not yet been acknowledged by workers or by this function will be purged. This effectively cancels all outstanding work.
 delete_queues_when_done (bool or None) – If this is True, and this function exits (either when all done, or when it is interrupted with a Ctrl+C’), all outstanding work items will be purged from the input/queues and the queues themselves will be deleted.
 download_when_done (bool or None) – If this is True, the generated checkplot pickle for each input work item will be downloaded immediately to the current working directory when the worker functions report they’re done with it.
 save_state_when_done (bool or None) – If this is True, will save the current state of the work item queue and the work items acknowledged as completed to a pickle in the current working directory. Call the runcp_producer_loop_savedstate function below to resume processing from this saved state later.
 s3_client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its S3 download operations. Alternatively, pass in an existing boto3.Client instance to reuse it here.
 sqs_client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its SQS operations. Alternatively, pass in an existing boto3.Client instance to reuse it here.
Returns: Returns the current work state as a dict or str path to the generated work state pickle depending on if save_state_when_done is True.
Return type: dict or str

astrobase.lcproc.awsrun.
runcp_consumer_loop
(in_queue_url, workdir, lclist_pkl_s3url, lc_altexts=('', ), wait_time_seconds=5, cache_clean_timer_seconds=3600.0, shutdown_check_timer_seconds=60.0, sqs_client=None, s3_client=None)[source]¶ This runs checkplot pickle making in a loop until interrupted.
Consumes work task items from an input queue set up by runcp_producer_loop above. For the moment, we don’t generate neighbor light curves since this would require a lot more S3 calls.
Parameters:  in_queue_url (str) – The SQS URL of the input queue to listen to for work assignment messages. The task orders will include the input and output S3 bucket names, as well as the URL of the output queue to where this function will report its workcomplete or workfailed status.
 workdir (str) – The directory on the local machine where this worker loop will download the input light curves and associated periodfinder results (if any), process them, and produce its output checkplot pickles. These will then be uploaded to the specified S3 output bucket and then deleted from the workdir when the upload is confirmed to make it safely to S3.
 lclist_pkl (str) – S3 URL of a catalog pickle generated by lcproc.catalogs.make_lclist that contains objectids and coordinates, as well as a kdtree for all of the objects in the current light curve collection being processed. This is used to look up neighbors for each object being processed.
 lc_altexts (sequence of str) – If not None, this is a sequence of alternate extensions to try for the input light curve file other than the one provided in the input task order. For example, to get anything that’s an .sqlite where .sqlite.gz is expected, use altexts=[‘’] to strip the .gz.
 wait_time_seconds (int) – The amount of time to wait in the input SQS queue for an input task order. If this timeout expires and no task has been received, this function goes back to the top of the work loop.
 cache_clean_timer_seconds (float) – The amount of time in seconds to wait before periodically removing old files (such as finder chart FITS, external service result pickles) from the astrobase cache directory. These accumulate as the work items are processed, and take up significant space, so must be removed periodically.
 shutdown_check_timer_seconds (float) – The amount of time to wait before checking for a pending EC2 shutdown message for the instance this worker loop is operating on. If a shutdown is noticed, the worker loop is cancelled in preparation for instance shutdown.
 sqs_client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its SQS operations. Alternatively, pass in an existing boto3.Client instance to reuse it here.
 s3_client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its S3 operations. Alternatively, pass in an existing boto3.Client instance to reuse it here.
Returns: Return type: Nothing.

astrobase.lcproc.awsrun.
runpf_producer_loop
(lightcurve_list, input_queue, input_bucket, result_queue, result_bucket, pfmethods=('gls', 'pdm', 'mav', 'bls', 'win'), pfkwargs=({}, {}, {}, {}, {}), extra_runpf_kwargs={'getblssnr': True}, process_list_slice=None, purge_queues_when_done=False, delete_queues_when_done=False, download_when_done=True, save_state_when_done=True, s3_client=None, sqs_client=None)[source]¶ This queues up work for periodfinders using SQS.
Parameters:  lightcurve_list (str or list of str) – This is either a string pointing to a file containing a list of light curves filenames to process or the list itself. The names must correspond to the full filenames of files stored on S3, including all prefixes, but not include the ‘s3://<bucket name>/’ bit (these will be added automatically).
 input_queue (str) – This is the name of the SQS queue which will receive processing tasks generated by this function. The queue URL will automatically be obtained from AWS.
 input_bucket (str) – The name of the S3 bucket containing the light curve files to process.
 result_queue (str) – This is the name of the SQS queue that this function will listen to for messages from the workers as they complete processing on their input elements. This function will attempt to match input sent to the input_queue with results coming into the result_queue so it knows how many objects have been successfully processed. If this function receives task results that aren’t in its own input queue, it will acknowledge them so they complete successfully, but not download them automatically. This handles leftover tasks completing from a previous run of this function.
 result_bucket (str) – The name of the S3 bucket which will receive the results from the workers.
 pfmethods (sequence of str) – This is a list of periodfinder method short names as listed in the lcproc.periodfinding.PFMETHODS dict. This is used to tell the worker loop which periodfinders to run on the input light curve.
 pfkwargs (sequence of dicts) – This contains optional kwargs as dicts to be supplied to all of the periodfinder functions listed in pfmethods. This should be the same length as that sequence.
 extra_runpf_kwargs (dict) – This is a dict of kwargs to be supplied to runpf driver function itself.
 process_list_slice (list) –
This is used to index into the input light curve list so a subset of the full list can be processed in this specific run of this function.
Use None for a slice index elem to emulate single slice spec behavior:
process_list_slice = [10, None] > lightcurve_list[10:] process_list_slice = [None, 500] > lightcurve_list[:500]
 purge_queues_when_done (bool) – If this is True, and this function exits (either when all done, or when it is interrupted with a Ctrl+C), all outstanding elements in the input/output queues that have not yet been acknowledged by workers or by this function will be purged. This effectively cancels all outstanding work.
 delete_queues_when_done (bool) – If this is True, and this function exits (either when all done, or when it is interrupted with a Ctrl+C’), all outstanding work items will be purged from the input/queues and the queues themselves will be deleted.
 download_when_done (bool) – If this is True, the generated periodfinding result pickle for each input work item will be downloaded immediately to the current working directory when the worker functions report they’re done with it.
 save_state_when_done (bool) – If this is True, will save the current state of the work item queue and the work items acknowledged as completed to a pickle in the current working directory. Call the runcp_producer_loop_savedstate function below to resume processing from this saved state later.
 s3_client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its S3 download operations. Alternatively, pass in an existing boto3.Client instance to reuse it here.
 sqs_client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its SQS operations. Alternatively, pass in an existing boto3.Client instance to reuse it here.
Returns: Returns the current work state as a dict or str path to the generated work state pickle depending on if save_state_when_done is True.
Return type: dict or str

astrobase.lcproc.awsrun.
runpf_consumer_loop
(in_queue_url, workdir, lc_altexts=('', ), wait_time_seconds=5, shutdown_check_timer_seconds=60.0, sqs_client=None, s3_client=None)[source]¶ This runs periodfinding in a loop until interrupted.
Consumes work task items from an input queue set up by runpf_producer_loop above.
Parameters:  in_queue_url (str) – The SQS URL of the input queue to listen to for work assignment messages. The task orders will include the input and output S3 bucket names, as well as the URL of the output queue to where this function will report its workcomplete or workfailed status.
 workdir (str) – The directory on the local machine where this worker loop will download the input light curves, process them, and produce its output periodfinding result pickles. These will then be uploaded to the specified S3 output bucket, and then deleted from the local disk.
 lc_altexts (sequence of str) – If not None, this is a sequence of alternate extensions to try for the input light curve file other than the one provided in the input task order. For example, to get anything that’s an .sqlite where .sqlite.gz is expected, use altexts=[‘’] to strip the .gz.
 wait_time_seconds (int) – The amount of time to wait in the input SQS queue for an input task order. If this timeout expires and no task has been received, this function goes back to the top of the work loop.
 shutdown_check_timer_seconds (float) – The amount of time to wait before checking for a pending EC2 shutdown message for the instance this worker loop is operating on. If a shutdown is noticed, the worker loop is cancelled in preparation for instance shutdown.
 sqs_client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its SQS operations. Alternatively, pass in an existing boto3.Client instance to reuse it here.
 s3_client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its S3 operations. Alternatively, pass in an existing boto3.Client instance to reuse it here.
Returns: Return type: Nothing.
astrobase.lcproc.catalogs module¶
This contains functions to generate light curve catalogs from collections of light curves.

astrobase.lcproc.catalogs.
make_lclist
(basedir, outfile, use_list_of_filenames=None, lcformat='hatsql', lcformatdir=None, fileglob=None, recursive=True, columns=('objectid', 'objectinfo.ra', 'objectinfo.decl', 'objectinfo.ndet'), makecoordindex=('objectinfo.ra', 'objectinfo.decl'), field_fitsfile=None, field_wcsfrom=None, field_scale=<astropy.visualization.interval.ZScaleInterval object>, field_stretch=<astropy.visualization.stretch.LinearStretch object>, field_colormap=<matplotlib.colors.LinearSegmentedColormap object>, field_findersize=None, field_pltopts={'marker': 'o', 'markeredgecolor': 'red', 'markeredgewidth': 2.0, 'markerfacecolor': 'none', 'markersize': 10.0}, field_grid=False, field_gridcolor='k', field_zoomcontain=True, maxlcs=None, nworkers=2)[source]¶ This generates a light curve catalog for all light curves in a directory.
Given a base directory where all the files are, and a light curve format, this will find all light curves, pull out the keys in each lcdict requested in the columns kwarg for each object, and write them to the requested output pickle file. These keys should be pointers to scalar values (i.e. something like objectinfo.ra is OK, but something like ‘times’ won’t work because it’s a vector).
Generally, this works with light curve reading functions that produce lcdicts as detailed in the docstring for lcproc.register_lcformat. Once you’ve registered your light curve reader functions using the lcproc.register_lcformat function, pass in the formatkey associated with your light curve format, and this function will be able to read all light curves in that format as well as the object information stored in their objectinfo dict.
Parameters:  basedir (str or list of str) –
If this is a str, points to a single directory to search for light curves. If this is a list of str, it must be a list of directories to search for light curves. All of these will be searched to find light curve files matching either your light curve format’s default fileglob (when you registered your LC format), or a specific fileglob that you can pass in using the fileglob kwargh here. If the recursive kwarg is set, the provided directories will be searched recursively.
If use_list_of_filenames is not None, it will override this argument and the function will take those light curves as the list of files it must process instead of whatever is specified in basedir.
 outfile (str) – This is the name of the output file to write. This will be a pickle file, so a good convention to use for this name is something like ‘mylightcurvecatalog.pkl’.
 use_list_of_filenames (list of str or None) – Use this kwarg to override whatever is provided in basedir and directly pass in a list of light curve files to process. This can speed up this function by a lot because no searches on disk will be performed to find light curve files matching basedir and fileglob.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 fileglob (str or None) – If provided, is a string that is a valid UNIX filename glob. Used to override the default fileglob for this LC format when searching for light curve files in basedir.
 recursive (bool) – If True, the directories specified in basedir will be searched recursively for all light curve files that match the default fileglob for this LC format or a specific one provided in fileglob.
 columns (list of str) –
This is a list of keys in the lcdict produced by your light curve reader function that contain object information, which will be extracted and put into the output light curve catalog. It’s highly recommended that your LC reader function produce a lcdict that contains at least the default keys shown here.
The lcdict keys to extract are specified by using an address scheme:
 First level dict keys can be specified directly: e.g., ‘objectid’ will extract lcdict[‘objectid’]
 Keys at other levels can be specified by using a period to indicate
the level:
 e.g., ‘objectinfo.ra’ will extract lcdict[‘objectinfo’][‘ra’]
 e.g., ‘objectinfo.varinfo.features.stetsonj’ will extract lcdict[‘objectinfo’][‘varinfo’][‘features’][‘stetsonj’]
 makecoordindex (list of two str or None) – This is used to specify which lcdict keys contain the right ascension and declination coordinates for this object. If these are provided, the output light curve catalog will have a kdtree built on all object coordinates, which enables fast spatial searches and crossmatching to external catalogs by checkplot and lcproc functions.
 field_fitsfile (str or None) – If this is not None, it should be the path to a FITS image containing the objects these light curves are for. If this is provided, make_lclist will use the WCS information in the FITS itself if field_wcsfrom is None (or from a WCS header file pointed to by field_wcsfrom) to obtain x and y pixel coordinates for all of the objects in the field. A finder chart will also be made using astrobase.plotbase.fits_finder_chart using the corresponding field_scale, _stretch, _colormap, _findersize, _pltopts, _grid, and _gridcolors kwargs for that function, reproduced here to enable customization of the finder chart plot.
 field_wcsfrom (str or None) – If wcsfrom is None, the WCS to transform the RA/Dec to pixel x/y will be taken from the FITS header of fitsfile. If this is not None, it must be a FITS or similar file that contains a WCS header in its first extension.
 field_scale (astropy.visualization.Interval object) – scale sets the normalization for the FITS pixel values. This is an astropy.visualization Interval object. See http://docs.astropy.org/en/stable/visualization/normalization.html for details on scale and stretch objects.
 field_stretch (astropy.visualization.Stretch object) – stretch sets the stretch function for mapping FITS pixel values to output pixel values. This is an astropy.visualization Stretch object. See http://docs.astropy.org/en/stable/visualization/normalization.html for details on scale and stretch objects.
 field_colormap (matplotlib Colormap object) – colormap is a matplotlib color map object to use for the output image.
 field_findersize (None or tuple of two ints) – If findersize is None, the output image size will be set by the NAXIS1 and NAXIS2 keywords in the input fitsfile FITS header. Otherwise, findersize must be a tuple with the intended x and y size of the image in inches (all output images will use a DPI = 100).
 field_pltopts (dict) – field_pltopts controls how the overlay points will be plotted. This a dict with standard matplotlib marker, etc. kwargs as keyval pairs, e.g. ‘markersize’, ‘markerfacecolor’, etc. The default options make red outline circles at the location of each object in the overlay.
 field_grid (bool) – grid sets if a grid will be made on the output image.
 field_gridcolor (str) – gridcolor sets the color of the grid lines. This is a usual matplotib color spec string.
 field_zoomcontain (bool) – field_zoomcontain controls if the finder chart will be zoomed to just contain the overlayed points. Everything outside the footprint of these points will be discarded.
 maxlcs (int or None) – This sets how many light curves to process in the input LC list generated by searching for LCs in basedir or in the list provided as use_list_of_filenames.
 nworkers (int) – This sets the number of parallel workers to launch to collect information from the light curves.
Returns: Returns the path to the generated light curve catalog pickle file.
Return type: str
 basedir (str or list of str) –

astrobase.lcproc.catalogs.
filter_lclist
(lc_catalog, objectidcol='objectid', racol='ra', declcol='decl', xmatchexternal=None, xmatchdistarcsec=3.0, externalcolnums=(0, 1, 2), externalcolnames=('objectid', 'ra', 'decl'), externalcoldtypes='U20, f8, f8', externalcolsep=None, externalcommentchar='#', conesearch=None, conesearchworkers=1, columnfilters=None, field_fitsfile=None, field_wcsfrom=None, field_scale=<astropy.visualization.interval.ZScaleInterval object>, field_stretch=<astropy.visualization.stretch.LinearStretch object>, field_colormap=<matplotlib.colors.LinearSegmentedColormap object>, field_findersize=None, field_pltopts={'marker': 'o', 'markeredgecolor': 'red', 'markeredgewidth': 2.0, 'markerfacecolor': 'none', 'markersize': 10.0}, field_grid=False, field_gridcolor='k', field_zoomcontain=True, copylcsto=None)[source]¶ This is used to perform conesearch, crossmatch, and columnfilter operations on a light curve catalog generated by make_lclist.
Uses the output of make_lclist above. This function returns a list of light curves matching various criteria specified by the xmatchexternal, conesearch, and columnfilters kwargs. Use this function to generate input lists for other lcproc functions, e.g. lcproc.lcvfeatures.parallel_varfeatures, lcproc.periodfinding.parallel_pf, and lcproc.lcbin.parallel_timebin, among others.
The operations are applied in this order if more than one is specified: xmatchexternal > conesearch > columnfilters. All results from these operations are joined using a logical AND operation.
Parameters:  objectidcol (str) – This is the name of the object ID column in the light curve catalog.
 racol (str) – This is the name of the RA column in the light curve catalog.
 declcol (str) – This is the name of the Dec column in the light curve catalog.
 xmatchexternal (str or None) – If provided, this is the filename of a text file containing objectids, ras and decs to match the objects in the light curve catalog to by their positions.
 xmatchdistarcsec (float) – This is the distance in arcseconds to use when crossmatching to the external catalog in xmatchexternal.
 externalcolnums (sequence of int) – This a list of the zeroindexed column numbers of columns to extract from the external catalog file.
 externalcolnames (sequence of str) – This is a list of names of columns that will be extracted from the external catalog file. This is the same length as externalcolnums. These must contain the names provided as the objectid, ra, and decl column names so this function knows which column numbers correspond to those columns and can use them to set up the crossmatch.
 externalcoldtypes (str) – This is a CSV string containing numpy dtype definitions for all columns listed to extract from the external catalog file. The number of dtype definitions should be equal to the number of columns to extract.
 externalcolsep (str or None) – The column separator to use when extracting columns from the external catalog file. If None, any whitespace between columns is used as the separator.
 externalcommentchar (str) – The character indicating that a line in the external catalog file is to be ignored.
 conesearch (list of float) –
This is used to specify conesearch parameters. It should be a three element list:
[center_ra_deg, center_decl_deg, search_radius_deg]
 conesearchworkers (int) – The number of parallel workers to launch for the conesearch operation.
 columnfilters (list of str) –
This is a list of strings indicating any filters to apply on each column in the light curve catalog. All column filters are applied in the specified sequence and are combined with a logical AND operator. The format of each filter string should be:
’<lc_catalog column><operator><operand>’
where:
 <lc_catalog column> is a column in the lc_catalog pickle file
 <operator> is one of: ‘lt’, ‘gt’, ‘le’, ‘ge’, ‘eq’, ‘ne’, which correspond to the usual operators: <, >, <=, >=, ==, != respectively.
 <operand> is a float, int, or string.
 field_fitsfile (str or None) – If this is not None, it should be the path to a FITS image containing the objects these light curves are for. If this is provided, make_lclist will use the WCS information in the FITS itself if field_wcsfrom is None (or from a WCS header file pointed to by field_wcsfrom) to obtain x and y pixel coordinates for all of the objects in the field. A finder chart will also be made using astrobase.plotbase.fits_finder_chart using the corresponding field_scale, _stretch, _colormap, _findersize, _pltopts, _grid, and _gridcolors kwargs for that function, reproduced here to enable customization of the finder chart plot.
 field_wcsfrom (str or None) – If wcsfrom is None, the WCS to transform the RA/Dec to pixel x/y will be taken from the FITS header of fitsfile. If this is not None, it must be a FITS or similar file that contains a WCS header in its first extension.
 field_scale (astropy.visualization.Interval object) – scale sets the normalization for the FITS pixel values. This is an astropy.visualization Interval object. See http://docs.astropy.org/en/stable/visualization/normalization.html for details on scale and stretch objects.
 field_stretch (astropy.visualization.Stretch object) – stretch sets the stretch function for mapping FITS pixel values to output pixel values. This is an astropy.visualization Stretch object. See http://docs.astropy.org/en/stable/visualization/normalization.html for details on scale and stretch objects.
 field_colormap (matplotlib Colormap object) – colormap is a matplotlib color map object to use for the output image.
 field_findersize (None or tuple of two ints) – If findersize is None, the output image size will be set by the NAXIS1 and NAXIS2 keywords in the input fitsfile FITS header. Otherwise, findersize must be a tuple with the intended x and y size of the image in inches (all output images will use a DPI = 100).
 field_pltopts (dict) – field_pltopts controls how the overlay points will be plotted. This a dict with standard matplotlib marker, etc. kwargs as keyval pairs, e.g. ‘markersize’, ‘markerfacecolor’, etc. The default options make red outline circles at the location of each object in the overlay.
 field_grid (bool) – grid sets if a grid will be made on the output image.
 field_gridcolor (str) – gridcolor sets the color of the grid lines. This is a usual matplotib color spec string.
 field_zoomcontain (bool) – field_zoomcontain controls if the finder chart will be zoomed to just contain the overlayed points. Everything outside the footprint of these points will be discarded.
 copylcsto (str) – If this is provided, it is interpreted as a directory target to copy all the light curves that match the specified conditions.
Returns: Returns a two elem tuple: (matching_object_lcfiles, matching_objectids) if conesearch and/or column filters are used. If xmatchexternal is also used, a threeelem tuple is returned: (matching_object_lcfiles, matching_objectids, extcat_matched_objectids).
Return type: tuple

astrobase.lcproc.catalogs.
add_cpinfo_to_lclist
(checkplots, initial_lc_catalog, magcol, outfile, checkplotglob='checkplot*.pkl*', infokeys=[('comments', <class 'numpy.str_'>, False, True, '', ''), ('objectinfo.objecttags', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.twomassid', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.bmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.vmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.rmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.imag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.jmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.hmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.kmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.sdssu', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.sdssg', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.sdssr', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.sdssi', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.sdssz', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_bmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_vmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_rmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_imag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_jmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_hmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_kmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_sdssu', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_sdssg', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_sdssr', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_sdssi', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_sdssz', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_bmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_vmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_rmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_imag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_jmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_hmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_kmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_sdssu', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_sdssg', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_sdssr', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_sdssi', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_sdssz', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.color_classes', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.pmra', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.pmdecl', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.propermotion', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.rpmj', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.gl', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.gb', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.gaia_status', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.gaia_ids.0', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.gaiamag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.gaia_parallax', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.gaia_parallax_err', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.gaia_absmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.simbad_best_mainid', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.simbad_best_objtype', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.simbad_best_allids', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.simbad_best_distarcsec', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.ticid', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.tic_version', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.tessmag', <class 'numpy.float64'>, True, True, nan, nan), ('varinfo.vartags', <class 'numpy.str_'>, False, True, '', ''), ('varinfo.varperiod', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.varepoch', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.varisperiodic', <class 'numpy.int64'>, False, True, 0, 0), ('varinfo.objectisvar', <class 'numpy.int64'>, False, True, 0, 0), ('varinfo.features.median', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.mad', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.stdev', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.mag_iqr', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.skew', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.kurtosis', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.stetsonj', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.stetsonk', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.eta_normal', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.linear_fit_slope', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.magnitude_ratio', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.beyond1std', <class 'numpy.float64'>, False, True, nan, nan)], nworkers=2)[source]¶ This adds checkplot info to the initial light curve catalogs generated by make_lclist.
This is used to incorporate all the extra info checkplots can have for objects back into columns in the light curve catalog produced by make_lclist. Objects are matched between the checkplots and the light curve catalog using their objectid. This then allows one to search this ‘augmented’ light curve catalog by these extra columns. The ‘augmented’ light curve catalog also forms the basis for search interface provided by the LCCServer.
The default list of keys that will be extracted from a checkplot and added as columns in the initial light curve catalog is listed above in the CPINFO_DEFAULTKEYS list.
Parameters:  checkplots (str or list) – If this is a str, is interpreted as a directory which will be searched for checkplot pickle files using checkplotglob. If this is a list, it will be interpreted as a list of checkplot pickle files to process.
 initial_lc_catalog (str) – This is the path to the light curve catalog pickle made by make_lclist.
 magcol (str) – This is used to indicate the light curve magnitude column to extract magnitude column specific information. For example, Stetson variability indices can be generated using magnitude measurements in separate photometric apertures, which appear in separate magcols in the checkplot. To associate each such feature of the object with its specific magcol, pass that magcol in here. This magcol will then be added as a prefix to the resulting column in the ‘augmented’ LC catalog, e.g. Stetson J will appear as magcol1_stetsonj and magcol2_stetsonj for two separate magcols.
 outfile (str) – This is the file name of the output ‘augmented’ light curve catalog pickle file that will be written.
 infokeys (list of tuples) –
This is a list of keys to extract from the checkplot and some info on how this extraction is to be done. Each key entry is a sixelement tuple of the following form:
 key name in the checkplot
 numpy dtype of the value of this key
 False if key is associated with a magcol or True otherwise
 False if subsequent updates to the same column name will append to existing key values in the output augmented light curve catalog or True if these will overwrite the existing key value
 character to use to substitute a None value of the key in the checkplot in the output light curve catalog column
 character to use to substitute a nan value of the key in the checkplot in the output light curve catalog column
See the CPFINFO_DEFAULTKEYS list above for examples.
 nworkers (int) – The number of parallel workers to launch to extract checkplot information.
Returns: Returns the path to the generated ‘augmented’ light curve catalog pickle file.
Return type: str
astrobase.lcproc.checkplotgen module¶
This contains functions to generate checkplot pickles from a collection of light curves (optionally including periodfinding results).

astrobase.lcproc.checkplotgen.
update_checkplotdict_nbrlcs
(checkplotdict, timecol, magcol, errcol, lcformat='hatsql', lcformatdir=None, verbose=True)[source]¶ For all neighbors in a checkplotdict, make LCs and phased LCs.
Parameters:  checkplotdict (dict) – This is the checkplot to process. The light curves for the neighbors to the object here will be extracted from the stored file paths, and this function will make plots of these timeseries. If the object has ‘best’ periods and epochs generated by periodfinder functions in this checkplotdict, phased light curve plots of each neighbor will be made using these to check the effects of blending.
 timecol,magcol,errcol (str) – The timecol, magcol, and errcol keys used to generate this object’s checkplot. This is used to extract the correct timesseries from the neighbors’ light curves.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
Returns: The input checkplotdict is returned with the neighor light curve plots added in.
Return type: dict

astrobase.lcproc.checkplotgen.
runcp
(pfpickle, outdir, lcbasedir, lcfname=None, cprenorm=False, lclistpkl=None, nbrradiusarcsec=60.0, maxnumneighbors=5, makeneighborlcs=True, fast_mode=False, gaia_max_timeout=60.0, gaia_mirror=None, xmatchinfo=None, xmatchradiusarcsec=3.0, minobservations=99, sigclip=10.0, lcformat='hatsql', lcformatdir=None, timecols=None, magcols=None, errcols=None, skipdone=False, done_callback=None, done_callback_args=None, done_callback_kwargs=None)[source]¶ This makes a checkplot pickle for the given periodfinding result pickle produced by lcproc.periodfinding.runpf.
Parameters:  pfpickle (str or None) – This is the filename of the periodfinding result pickle file created by lcproc.periodfinding.runpf. If this is None, the checkplot will be made anyway, but no phased LC information will be collected into the output checkplot pickle. This can be useful for just collecting GAIA and other external information and making LC plots for an object.
 outdir (str) – This is the directory to which the output checkplot pickle will be written.
 lcbasedir (str) – The base directory where this function will look for the light curve file associated with the object in the input periodfinding result pickle file.
 lcfname (str or None) –
This is usually None because we’ll get the path to the light curve associated with this periodfinding pickle from the pickle itself. If pfpickle is None, however, this function will use lcfname to look up the light curve file instead. If both are provided, the value of lcfname takes precedence.
Providing the light curve file name in this kwarg is useful when you’re making checkplots directly from light curve files and not including periodfinder results (perhaps because periodfinding takes a long time for large collections of LCs).
 cprenorm (bool) – Set this to True if the light curves should be renormalized by checkplot.checkplot_pickle. This is set to False by default because we do our own normalization in this function using the light curve’s registered normalization function and pass the normalized times, mags, errs to the checkplot.checkplot_pickle function.
 lclistpkl (str or dict) – This is either the filename of a pickle or the actual dict produced by lcproc.make_lclist. This is used to gather neighbor information.
 nbrradiusarcsec (float) – The radius in arcseconds to use for a search conducted around the coordinates of this object to look for any potential confusion and blending of variability amplitude caused by their proximity.
 maxnumneighbors (int) – The maximum number of neighbors that will have their light curves and magnitudes noted in this checkplot as potential blends with the target object.
 makeneighborlcs (bool) – If True, will make light curve and phased light curve plots for all neighbors to the current object found in the catalog passed in using lclistpkl.
 fast_mode (bool or float) –
This runs the external catalog operations in a “fast” mode, with short timeouts and not trying to hit external catalogs that take a long time to respond.
If this is set to True, the default settings for the external requests will then become:
skyview_lookup = False skyview_timeout = 10.0 skyview_retry_failed = False dust_timeout = 10.0 gaia_submit_timeout = 7.0 gaia_max_timeout = 10.0 gaia_submit_tries = 2 complete_query_later = False search_simbad = False
If this is a float, will run in “fast” mode with the provided timeout value in seconds and the following settings:
skyview_lookup = True skyview_timeout = fast_mode skyview_retry_failed = False dust_timeout = fast_mode gaia_submit_timeout = 0.66*fast_mode gaia_max_timeout = fast_mode gaia_submit_tries = 2 complete_query_later = False search_simbad = False
 gaia_max_timeout (float) – Sets the timeout in seconds to use when waiting for the GAIA service to respond to our request for the object’s information. Note that if fast_mode is set, this is ignored.
 gaia_mirror (str or None) – This sets the GAIA mirror to use. This is a key in the services.gaia.GAIA_URLS dict which defines the URLs to hit for each mirror.
 xmatchinfo (str or dict) – This is either the xmatch dict produced by the function load_xmatch_external_catalogs above, or the path to the xmatch info pickle file produced by that function.
 xmatchradiusarcsec (float) – This is the crossmatching radius to use in arcseconds.
 minobservations (int) – The minimum of observations the input object’s mag/flux timeseries must have for this function to plot its light curve and phased light curve. If the object has less than this number, no light curves will be plotted, but the checkplotdict will still contain all of the other information.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 timecols (list of str or None) – The timecol keys to use from the lcdict in generating this checkplot.
 magcols (list of str or None) – The magcol keys to use from the lcdict in generating this checkplot.
 errcols (list of str or None) – The errcol keys to use from the lcdict in generating this checkplot.
 skipdone (bool) – This indicates if this function will skip creating checkplots that already exist corresponding to the current objectid and magcol. If skipdone is set to True, this will be done.
 done_callback (Python function or None) –
This is used to provide a function to execute after the checkplot pickles are generated. This is useful if you want to stream the results of checkplot making to some other process, e.g. directly running an ingestion into an LCCServer collection. The function will always get the list of the generated checkplot pickles as its first arg, and all of the kwargs for runcp in the kwargs dict. Additional args and kwargs can be provided by giving a list in the done_callbacks_args kwarg and a dict in the done_callbacks_kwargs kwarg.
NOTE: the function you pass in here should be pickleable by normal Python if you want to use it with the parallel_cp and parallel_cp_lcdir functions below.
 done_callback_args (tuple or None) – If not None, contains any args to pass into the done_callback function.
 done_callback_kwargs (dict or None) – If not None, contains any kwargs to pass into the done_callback function.
Returns: This returns a list of checkplot pickle filenames with one element for each (timecol, magcol, errcol) combination provided in the default lcformat config or in the timecols, magcols, errcols kwargs.
Return type: list of str

astrobase.lcproc.checkplotgen.
runcp_worker
(task)[source]¶ This is the worker for running checkplots.
Parameters: task (tuple) – This is of the form: (pfpickle, outdir, lcbasedir, kwargs). Returns: The list of checkplot pickles returned by the runcp function. Return type: list of str

astrobase.lcproc.checkplotgen.
parallel_cp
(pfpicklelist, outdir, lcbasedir, fast_mode=False, lcfnamelist=None, cprenorm=False, lclistpkl=None, gaia_max_timeout=60.0, gaia_mirror=None, nbrradiusarcsec=60.0, maxnumneighbors=5, makeneighborlcs=True, xmatchinfo=None, xmatchradiusarcsec=3.0, sigclip=10.0, minobservations=99, lcformat='hatsql', lcformatdir=None, timecols=None, magcols=None, errcols=None, skipdone=False, done_callback=None, done_callback_args=None, done_callback_kwargs=None, liststartindex=None, maxobjects=None, nworkers=2)[source]¶ This drives the parallel execution of runcp for a list of periodfinding result pickles.
Parameters:  pfpicklelist (list of str or list of Nones) – This is the list of the filenames of the periodfinding result pickles to process. To make checkplots using the light curves directly, set this to a list of Nones with the same length as the list of light curve files that you provide in lcfnamelist.
 outdir (str) – The directory the checkplot pickles will be written to.
 lcbasedir (str) – The base directory that this function will look in to find the light curves pointed to by the periodfinding result files. If you’re using lcfnamelist to provide a list of light curve filenames directly, this arg is ignored.
 lcfnamelist (list of str or None) – If this is provided, it must be a list of the input light curve filenames to process. These can either be associated with each input periodfinder result pickle, or can be provided standalone to make checkplots without phased LC plots in them. In the second case, you must set pfpicklelist to a list of Nones that matches the length of lcfnamelist.
 cprenorm (bool) – Set this to True if the light curves should be renormalized by checkplot.checkplot_pickle. This is set to False by default because we do our own normalization in this function using the light curve’s registered normalization function and pass the normalized times, mags, errs to the checkplot.checkplot_pickle function.
 lclistpkl (str or dict) – This is either the filename of a pickle or the actual dict produced by lcproc.make_lclist. This is used to gather neighbor information.
 nbrradiusarcsec (float) – The radius in arcseconds to use for a search conducted around the coordinates of this object to look for any potential confusion and blending of variability amplitude caused by their proximity.
 maxnumneighbors (int) – The maximum number of neighbors that will have their light curves and magnitudes noted in this checkplot as potential blends with the target object.
 makeneighborlcs (bool) – If True, will make light curve and phased light curve plots for all neighbors found in the object collection for each input object.
 fast_mode (bool or float) –
This runs the external catalog operations in a “fast” mode, with short timeouts and not trying to hit external catalogs that take a long time to respond.
If this is set to True, the default settings for the external requests will then become:
skyview_lookup = False skyview_timeout = 10.0 skyview_retry_failed = False dust_timeout = 10.0 gaia_submit_timeout = 7.0 gaia_max_timeout = 10.0 gaia_submit_tries = 2 complete_query_later = False search_simbad = False
If this is a float, will run in “fast” mode with the provided timeout value in seconds and the following settings:
skyview_lookup = True skyview_timeout = fast_mode skyview_retry_failed = False dust_timeout = fast_mode gaia_submit_timeout = 0.66*fast_mode gaia_max_timeout = fast_mode gaia_submit_tries = 2 complete_query_later = False search_simbad = False
 gaia_max_timeout (float) – Sets the timeout in seconds to use when waiting for the GAIA service to respond to our request for the object’s information. Note that if fast_mode is set, this is ignored.
 gaia_mirror (str or None) – This sets the GAIA mirror to use. This is a key in the services.gaia.GAIA_URLS dict which defines the URLs to hit for each mirror.
 xmatchinfo (str or dict) – This is either the xmatch dict produced by the function load_xmatch_external_catalogs above, or the path to the xmatch info pickle file produced by that function.
 xmatchradiusarcsec (float) – This is the crossmatching radius to use in arcseconds.
 minobservations (int) – The minimum of observations the input object’s mag/flux timeseries must have for this function to plot its light curve and phased light curve. If the object has less than this number, no light curves will be plotted, but the checkplotdict will still contain all of the other information.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 timecols (list of str or None) – The timecol keys to use from the lcdict in generating this checkplot.
 magcols (list of str or None) – The magcol keys to use from the lcdict in generating this checkplot.
 errcols (list of str or None) – The errcol keys to use from the lcdict in generating this checkplot.
 skipdone (bool) – This indicates if this function will skip creating checkplots that already exist corresponding to the current objectid and magcol. If skipdone is set to True, this will be done.
 done_callback (Python function or None) –
This is used to provide a function to execute after the checkplot pickles are generated. This is useful if you want to stream the results of checkplot making to some other process, e.g. directly running an ingestion into an LCCServer collection. The function will always get the list of the generated checkplot pickles as its first arg, and all of the kwargs for runcp in the kwargs dict. Additional args and kwargs can be provided by giving a list in the done_callbacks_args kwarg and a dict in the done_callbacks_kwargs kwarg.
NOTE: the function you pass in here should be pickleable by normal Python if you want to use it with the parallel_cp and parallel_cp_lcdir functions below.
 done_callback_args (tuple or None) – If not None, contains any args to pass into the done_callback function.
 done_callback_kwargs (dict or None) – If not None, contains any kwargs to pass into the done_callback function.
 liststartindex (int) – The index of the pfpicklelist (and lcfnamelist if provided) to start working at.
 maxobjects (int) – The maximum number of objects to process in this run. Use this with liststartindex to effectively distribute working on a large list of input periodfinding result pickles (and light curves if lcfnamelist is also provided) over several sessions or machines.
 nworkers (int) – The number of parallel workers that will work on the checkplot generation process.
Returns: This returns a dict with keys = input periodfinding pickles and vals = list of the corresponding checkplot pickles produced.
Return type: dict

astrobase.lcproc.checkplotgen.
parallel_cp_pfdir
(pfpickledir, outdir, lcbasedir, pfpickleglob='periodfinding*.pkl*', lclistpkl=None, cprenorm=False, nbrradiusarcsec=60.0, maxnumneighbors=5, makeneighborlcs=True, fast_mode=False, gaia_max_timeout=60.0, gaia_mirror=None, xmatchinfo=None, xmatchradiusarcsec=3.0, minobservations=99, sigclip=10.0, lcformat='hatsql', lcformatdir=None, timecols=None, magcols=None, errcols=None, skipdone=False, done_callback=None, done_callback_args=None, done_callback_kwargs=None, maxobjects=None, nworkers=32)[source]¶ This drives the parallel execution of runcp for a directory of periodfinding pickles.
Parameters:  pfpickledir (str) – This is the directory containing all of the periodfinding pickles to process.
 outdir (str) – The directory the checkplot pickles will be written to.
 lcbasedir (str) – The base directory that this function will look in to find the light curves pointed to by the periodfinding result files. If you’re using lcfnamelist to provide a list of light curve filenames directly, this arg is ignored.
 pkpickleglob (str) – This is a UNIX file glob to select periodfinding result pickles in the specified pfpickledir.
 lclistpkl (str or dict) – This is either the filename of a pickle or the actual dict produced by lcproc.make_lclist. This is used to gather neighbor information.
 cprenorm (bool) – Set this to True if the light curves should be renormalized by checkplot.checkplot_pickle. This is set to False by default because we do our own normalization in this function using the light curve’s registered normalization function and pass the normalized times, mags, errs to the checkplot.checkplot_pickle function.
 nbrradiusarcsec (float) – The radius in arcseconds to use for a search conducted around the coordinates of this object to look for any potential confusion and blending of variability amplitude caused by their proximity.
 maxnumneighbors (int) – The maximum number of neighbors that will have their light curves and magnitudes noted in this checkplot as potential blends with the target object.
 makeneighborlcs (bool) – If True, will make light curve and phased light curve plots for all neighbors found in the object collection for each input object.
 fast_mode (bool or float) –
This runs the external catalog operations in a “fast” mode, with short timeouts and not trying to hit external catalogs that take a long time to respond.
If this is set to True, the default settings for the external requests will then become:
skyview_lookup = False skyview_timeout = 10.0 skyview_retry_failed = False dust_timeout = 10.0 gaia_submit_timeout = 7.0 gaia_max_timeout = 10.0 gaia_submit_tries = 2 complete_query_later = False search_simbad = False
If this is a float, will run in “fast” mode with the provided timeout value in seconds and the following settings:
skyview_lookup = True skyview_timeout = fast_mode skyview_retry_failed = False dust_timeout = fast_mode gaia_submit_timeout = 0.66*fast_mode gaia_max_timeout = fast_mode gaia_submit_tries = 2 complete_query_later = False search_simbad = False
 gaia_max_timeout (float) – Sets the timeout in seconds to use when waiting for the GAIA service to respond to our request for the object’s information. Note that if fast_mode is set, this is ignored.
 gaia_mirror (str or None) – This sets the GAIA mirror to use. This is a key in the services.gaia.GAIA_URLS dict which defines the URLs to hit for each mirror.
 xmatchinfo (str or dict) – This is either the xmatch dict produced by the function load_xmatch_external_catalogs above, or the path to the xmatch info pickle file produced by that function.
 xmatchradiusarcsec (float) – This is the crossmatching radius to use in arcseconds.
 minobservations (int) – The minimum of observations the input object’s mag/flux timeseries must have for this function to plot its light curve and phased light curve. If the object has less than this number, no light curves will be plotted, but the checkplotdict will still contain all of the other information.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 timecols (list of str or None) – The timecol keys to use from the lcdict in generating this checkplot.
 magcols (list of str or None) – The magcol keys to use from the lcdict in generating this checkplot.
 errcols (list of str or None) – The errcol keys to use from the lcdict in generating this checkplot.
 skipdone (bool) – This indicates if this function will skip creating checkplots that already exist corresponding to the current objectid and magcol. If skipdone is set to True, this will be done.
 done_callback (Python function or None) –
This is used to provide a function to execute after the checkplot pickles are generated. This is useful if you want to stream the results of checkplot making to some other process, e.g. directly running an ingestion into an LCCServer collection. The function will always get the list of the generated checkplot pickles as its first arg, and all of the kwargs for runcp in the kwargs dict. Additional args and kwargs can be provided by giving a list in the done_callbacks_args kwarg and a dict in the done_callbacks_kwargs kwarg.
NOTE: the function you pass in here should be pickleable by normal Python if you want to use it with the parallel_cp and parallel_cp_lcdir functions below.
 done_callback_args (tuple or None) – If not None, contains any args to pass into the done_callback function.
 done_callback_kwargs (dict or None) – If not None, contains any kwargs to pass into the done_callback function.
 maxobjects (int) – The maximum number of objects to process in this run.
 nworkers (int) – The number of parallel workers that will work on the checkplot generation process.
Returns: This returns a dict with keys = input periodfinding pickles and vals = list of the corresponding checkplot pickles produced.
Return type: dict
astrobase.lcproc.checkplotproc module¶
This contains functions to postprocess checkplot pickles generated from a collection of light curves beforehand (perhaps using lcproc.checkplotgen).

astrobase.lcproc.checkplotproc.
xmatch_cplist_external_catalogs
(cplist, xmatchpkl, xmatchradiusarcsec=2.0, updateexisting=True, resultstodir=None)[source]¶ This xmatches external catalogs to a collection of checkplots.
Parameters:  cplist (list of str) – This is the list of checkplot pickle files to process.
 xmatchpkl (str) – The filename of a pickle prepared beforehand with the checkplot.pkl_xmatch.load_xmatch_external_catalogs function, containing collected external catalogs to crossmatch the objects in the input cplist against.
 xmatchradiusarcsec (float) – The match radius to use for the crossmatch in arcseconds.
 updateexisting (bool) – If this is True, will only update the xmatch dict in each checkplot pickle with any new crossmatches to the external catalogs. If False, will overwrite the xmatch dict with results from the current run.
 resultstodir (str or None) – If this is provided, then it must be a directory to write the resulting checkplots to after xmatch is done. This can be used to keep the original checkplots in pristine condition for some reason.
Returns: Returns a dict with keys = input checkplot pickle filenames and vals = xmatch status dict for each checkplot pickle.
Return type: dict

astrobase.lcproc.checkplotproc.
xmatch_cpdir_external_catalogs
(cpdir, xmatchpkl, cpfileglob='checkplot*.pkl*', xmatchradiusarcsec=2.0, updateexisting=True, resultstodir=None)[source]¶ This xmatches external catalogs to all checkplots in a directory.
Parameters:  cpdir (str) – This is the directory to search in for checkplots.
 xmatchpkl (str) – The filename of a pickle prepared beforehand with the checkplot.pkl_xmatch.load_xmatch_external_catalogs function, containing collected external catalogs to crossmatch the objects in the input cplist against.
 cpfileglob (str) – This is the UNIX fileglob to use in searching for checkplots.
 xmatchradiusarcsec (float) – The match radius to use for the crossmatch in arcseconds.
 updateexisting (bool) – If this is True, will only update the xmatch dict in each checkplot pickle with any new crossmatches to the external catalogs. If False, will overwrite the xmatch dict with results from the current run.
 resultstodir (str or None) – If this is provided, then it must be a directory to write the resulting checkplots to after xmatch is done. This can be used to keep the original checkplots in pristine condition for some reason.
Returns: Returns a dict with keys = input checkplot pickle filenames and vals = xmatch status dict for each checkplot pickle.
Return type: dict

astrobase.lcproc.checkplotproc.
colormagdiagram_cplist
(cplist, outpkl, color_mag1=('gaiamag', 'sdssg'), color_mag2=('kmag', 'kmag'), yaxis_mag=('gaia_absmag', 'rpmj'))[source]¶ This makes colormag diagrams for all checkplot pickles in the provided list.
Can make an arbitrary number of CMDs given lists of xaxis colors and yaxis mags to use.
Parameters:  cplist (list of str) – This is the list of checkplot pickles to process.
 outpkl (str) – The filename of the output pickle that will contain the colormag information for all objects in the checkplots specified in cplist.
 color_mag1 (list of str) –
This a list of the keys in each checkplot’s objectinfo dict that will be used as color_1 in the equation:
xaxis color = color_mag1  color_mag2
 color_mag2 (list of str) –
This a list of the keys in each checkplot’s objectinfo dict that will be used as color_2 in the equation:
xaxis color = color_mag1  color_mag2
 yaxis_mag (list of str) – This is a list of the keys in each checkplot’s objectinfo dict that will be used as the (absolute) magnitude yaxis of the colormag diagrams.
Returns: The path to the generated CMD pickle file for the collection of objects in the input checkplot list.
Return type: str
Notes
This can make many CMDs in one go. For example, the default kwargs for color_mag, color_mag2, and yaxis_mag result in two CMDs generated and written to the output pickle file:
 CMD1 > gaiamag  kmag on the xaxis vs gaia_absmag on the yaxis
 CMD2 > sdssg  kmag on the xaxis vs rpmj (J reduced PM) on the yaxis

astrobase.lcproc.checkplotproc.
colormagdiagram_cpdir
(cpdir, outpkl, cpfileglob='checkplot*.pkl*', color_mag1=('gaiamag', 'sdssg'), color_mag2=('kmag', 'kmag'), yaxis_mag=('gaia_absmag', 'rpmj'))[source]¶ This makes CMDs for all checkplot pickles in the provided directory.
Can make an arbitrary number of CMDs given lists of xaxis colors and yaxis mags to use.
Parameters:  cpdir (list of str) – This is the directory to get the list of input checkplot pickles from.
 outpkl (str) – The filename of the output pickle that will contain the colormag information for all objects in the checkplots specified in cplist.
 cpfileglob (str) – The UNIX fileglob to use to search for checkplot pickle files.
 color_mag1 (list of str) –
This a list of the keys in each checkplot’s objectinfo dict that will be used as color_1 in the equation:
xaxis color = color_mag1  color_mag2
 color_mag2 (list of str) –
This a list of the keys in each checkplot’s objectinfo dict that will be used as color_2 in the equation:
xaxis color = color_mag1  color_mag2
 yaxis_mag (list of str) – This is a list of the keys in each checkplot’s objectinfo dict that will be used as the (absolute) magnitude yaxis of the colormag diagrams.
Returns: The path to the generated CMD pickle file for the collection of objects in the input checkplot directory.
Return type: str
Notes
This can make many CMDs in one go. For example, the default kwargs for color_mag, color_mag2, and yaxis_mag result in two CMDs generated and written to the output pickle file:
 CMD1 > gaiamag  kmag on the xaxis vs gaia_absmag on the yaxis
 CMD2 > sdssg  kmag on the xaxis vs rpmj (J reduced PM) on the yaxis

astrobase.lcproc.checkplotproc.
add_cmd_to_checkplot
(cpx, cmdpkl, require_cmd_magcolor=True, save_cmd_pngs=False)[source]¶ This adds CMD figures to a checkplot dict or pickle.
Looks up the CMDs in cmdpkl, adds the object from cpx as a gold(ish) star in the plot, and then saves the figure to a base64 encoded PNG, which can then be read and used by the checkplotserver.
Parameters:  cpx (str or dict) – This is the input checkplot pickle or dict to add the CMD to.
 cmdpkl (str or dict) – The CMD pickle generated by the colormagdiagram_cplist or colormagdiagram_cpdir functions above, or the dict produced by reading this pickle in.
 require_cmd_magcolor (bool) – If this is True, a CMD plot will not be made if the color and mag keys required by the CMD are not present or are nan in this checkplot’s objectinfo dict.
 save_cmd_png (bool) – If this is True, then will save the CMD plots that were generated and added back to the checkplotdict as PNGs to the same directory as cpx. If cpx is a dict, will save them to the current working directory.
Returns: If cpx was a str filename of checkplot pickle, this will return that filename to indicate that the CMD was added to the file. If cpx was a checkplotdict, this will return the checkplotdict with a new key called ‘colormagdiagram’ containing the base64 encoded PNG binary streams of all CMDs generated.
Return type: str or dict

astrobase.lcproc.checkplotproc.
add_cmds_cplist
(cplist, cmdpkl, require_cmd_magcolor=True, save_cmd_pngs=False)[source]¶ This adds CMDs for each object in cplist.
Parameters:  cplist (list of str) – This is the input list of checkplot pickles to add the CMDs to.
 cmdpkl (str) – This is the filename of the CMD pickle created previously.
 require_cmd_magcolor (bool) – If this is True, a CMD plot will not be made if the color and mag keys required by the CMD are not present or are nan in each checkplot’s objectinfo dict.
 save_cmd_pngs (bool) – If this is True, then will save the CMD plots that were generated and added back to the checkplotdict as PNGs to the same directory as cpx.
Returns: Return type: Nothing.

astrobase.lcproc.checkplotproc.
add_cmds_cpdir
(cpdir, cmdpkl, cpfileglob='checkplot*.pkl*', require_cmd_magcolor=True, save_cmd_pngs=False)[source]¶ This adds CMDs for each object in cpdir.
Parameters:  cpdir (list of str) – This is the directory to search for checkplot pickles.
 cmdpkl (str) – This is the filename of the CMD pickle created previously.
 cpfileglob (str) – The UNIX fileglob to use when searching for checkplot pickles to operate on.
 require_cmd_magcolor (bool) – If this is True, a CMD plot will not be made if the color and mag keys required by the CMD are not present or are nan in each checkplot’s objectinfo dict.
 save_cmd_pngs (bool) – If this is True, then will save the CMD plots that were generated and added back to the checkplotdict as PNGs to the same directory as cpx.
Returns: Return type: Nothing.

astrobase.lcproc.checkplotproc.
cp_objectinfo_worker
(task)[source]¶ This is a parallel worker for parallel_update_cp_objectinfo.
Parameters: task (tuple) –  task[0] = checkplot pickle file
 task[1] = kwargs
Returns: The name of the checkplot file that was updated. None if the update fails for some reason. Return type: str

astrobase.lcproc.checkplotproc.
parallel_update_objectinfo_cplist
(cplist, liststartindex=None, maxobjects=None, nworkers=2, fast_mode=False, findercmap='gray_r', finderconvolve=None, deredden_object=True, custom_bandpasses=None, gaia_submit_timeout=10.0, gaia_submit_tries=3, gaia_max_timeout=180.0, gaia_mirror=None, complete_query_later=True, lclistpkl=None, nbrradiusarcsec=60.0, maxnumneighbors=5, plotdpi=100, findercachedir='~/.astrobase/stampcache', verbose=True)[source]¶ This updates objectinfo for a list of checkplots.
Useful in cases where a previous round of GAIA/finderchart/external catalog acquisition failed. This will preserve the following keys in the checkplots if they exist:
comments varinfo objectinfo.objecttags
Parameters:  cplist (list of str) – A list of checkplot pickle file names to update.
 liststartindex (int) – The index of the input list to start working at.
 maxobjects (int) – The maximum number of objects to process in this run. Use this with liststartindex to effectively distribute working on a large list of input checkplot pickles over several sessions or machines.
 nworkers (int) – The number of parallel workers that will work on the checkplot update process.
 fast_mode (bool or float) – This runs the external catalog operations in a “fast” mode, with short timeouts and not trying to hit external catalogs that take a long time to respond. See the docstring for checkplot.pkl_utils._pkl_finder_objectinfo for details on how this works. If this is True, will run in “fast” mode with default timeouts (5 seconds in most cases). If this is a float, will run in “fast” mode with the provided timeout value in seconds.
 findercmap (str or matplotlib.cm.ColorMap object) –
 findercmap – The Colormap object to use for the finder chart image.
 finderconvolve (astropy.convolution.Kernel object or None) – If not None, the Kernel object to use for convolving the finder image.
 deredden_objects (bool) – If this is True, will use the 2MASS DUST service to get extinction coefficients in various bands, and then try to deredden the magnitudes and colors of the object already present in the checkplot’s objectinfo dict.
 custom_bandpasses (dict) – This is a dict used to provide custom bandpass definitions for any magnitude measurements in the objectinfo dict that are not automatically recognized by the varclass.starfeatures.color_features function. See its docstring for details on the required format.
 gaia_submit_timeout (float) – Sets the timeout in seconds to use when submitting a request to look up the object’s information to the GAIA service. Note that if fast_mode is set, this is ignored.
 gaia_submit_tries (int) – Sets the maximum number of times the GAIA services will be contacted to obtain this object’s information. If fast_mode is set, this is ignored, and the services will be contacted only once (meaning that a failure to respond will be silently ignored and no GAIA data will be added to the checkplot’s objectinfo dict).
 gaia_max_timeout (float) – Sets the timeout in seconds to use when waiting for the GAIA service to respond to our request for the object’s information. Note that if fast_mode is set, this is ignored.
 gaia_mirror (str) – This sets the GAIA mirror to use. This is a key in the services.gaia.GAIA_URLS dict which defines the URLs to hit for each mirror.
 complete_query_later (bool) – If this is True, saves the state of GAIA queries that are not yet complete when gaia_max_timeout is reached while waiting for the GAIA service to respond to our request. A later call for GAIA info on the same object will attempt to pick up the results from the existing query if it’s completed. If fast_mode is True, this is ignored.
 lclistpkl (dict or str) – If this is provided, must be a dict resulting from reading a catalog produced by the lcproc.catalogs.make_lclist function or a str path pointing to the pickle file produced by that function. This catalog is used to find neighbors of the current object in the current light curve collection. Looking at neighbors of the object within the radius specified by nbrradiusarcsec is useful for light curves produced by instruments that have a large pixel scale, so are susceptible to blending of variability and potential confusion of neighbor variability with that of the actual object being looked at. If this is None, no neighbor lookups will be performed.
 nbrradiusarcsec (float) – The radius in arcseconds to use for a search conducted around the coordinates of this object to look for any potential confusion and blending of variability amplitude caused by their proximity.
 maxnumneighbors (int) – The maximum number of neighbors that will have their light curves and magnitudes noted in this checkplot as potential blends with the target object.
 plotdpi (int) – The resolution in DPI of the plots to generate in this function (e.g. the finder chart, etc.)
 findercachedir (str) – The path to the astrobase cache directory for finder chart downloads from the NASA SkyView service.
 verbose (bool) – If True, will indicate progress and warn about potential problems.
Returns: Paths to the updated checkplot pickle file.
Return type: list of str

astrobase.lcproc.checkplotproc.
parallel_update_objectinfo_cpdir
(cpdir, cpglob='checkplot*.pkl*', liststartindex=None, maxobjects=None, nworkers=2, fast_mode=False, findercmap='gray_r', finderconvolve=None, deredden_object=True, custom_bandpasses=None, gaia_submit_timeout=10.0, gaia_submit_tries=3, gaia_max_timeout=180.0, gaia_mirror=None, complete_query_later=True, lclistpkl=None, nbrradiusarcsec=60.0, maxnumneighbors=5, plotdpi=100, findercachedir='~/.astrobase/stampcache', verbose=True)[source]¶ This updates the objectinfo for a directory of checkplot pickles.
Useful in cases where a previous round of GAIA/finderchart/external catalog acquisition failed. This will preserve the following keys in the checkplots if they exist:
comments varinfo objectinfo.objecttags
Parameters:  cpdir (str) – The directory to look for checkplot pickles in.
 cpglob (str) – The UNIX fileglob to use when searching for checkplot pickle files.
 liststartindex (int) – The index of the input list to start working at.
 maxobjects (int) – The maximum number of objects to process in this run. Use this with liststartindex to effectively distribute working on a large list of input checkplot pickles over several sessions or machines.
 nworkers (int) – The number of parallel workers that will work on the checkplot update process.
 fast_mode (bool or float) – This runs the external catalog operations in a “fast” mode, with short timeouts and not trying to hit external catalogs that take a long time to respond. See the docstring for checkplot.pkl_utils._pkl_finder_objectinfo for details on how this works. If this is True, will run in “fast” mode with default timeouts (5 seconds in most cases). If this is a float, will run in “fast” mode with the provided timeout value in seconds.
 findercmap (str or matplotlib.cm.ColorMap object) –
 findercmap – The Colormap object to use for the finder chart image.
 finderconvolve (astropy.convolution.Kernel object or None) – If not None, the Kernel object to use for convolving the finder image.
 deredden_objects (bool) – If this is True, will use the 2MASS DUST service to get extinction coefficients in various bands, and then try to deredden the magnitudes and colors of the object already present in the checkplot’s objectinfo dict.
 custom_bandpasses (dict) – This is a dict used to provide custom bandpass definitions for any magnitude measurements in the objectinfo dict that are not automatically recognized by the varclass.starfeatures.color_features function. See its docstring for details on the required format.
 gaia_submit_timeout (float) – Sets the timeout in seconds to use when submitting a request to look up the object’s information to the GAIA service. Note that if fast_mode is set, this is ignored.
 gaia_submit_tries (int) – Sets the maximum number of times the GAIA services will be contacted to obtain this object’s information. If fast_mode is set, this is ignored, and the services will be contacted only once (meaning that a failure to respond will be silently ignored and no GAIA data will be added to the checkplot’s objectinfo dict).
 gaia_max_timeout (float) – Sets the timeout in seconds to use when waiting for the GAIA service to respond to our request for the object’s information. Note that if fast_mode is set, this is ignored.
 gaia_mirror (str) – This sets the GAIA mirror to use. This is a key in the services.gaia.GAIA_URLS dict which defines the URLs to hit for each mirror.
 complete_query_later (bool) – If this is True, saves the state of GAIA queries that are not yet complete when gaia_max_timeout is reached while waiting for the GAIA service to respond to our request. A later call for GAIA info on the same object will attempt to pick up the results from the existing query if it’s completed. If fast_mode is True, this is ignored.
 lclistpkl (dict or str) – If this is provided, must be a dict resulting from reading a catalog produced by the lcproc.catalogs.make_lclist function or a str path pointing to the pickle file produced by that function. This catalog is used to find neighbors of the current object in the current light curve collection. Looking at neighbors of the object within the radius specified by nbrradiusarcsec is useful for light curves produced by instruments that have a large pixel scale, so are susceptible to blending of variability and potential confusion of neighbor variability with that of the actual object being looked at. If this is None, no neighbor lookups will be performed.
 nbrradiusarcsec (float) – The radius in arcseconds to use for a search conducted around the coordinates of this object to look for any potential confusion and blending of variability amplitude caused by their proximity.
 maxnumneighbors (int) – The maximum number of neighbors that will have their light curves and magnitudes noted in this checkplot as potential blends with the target object.
 plotdpi (int) – The resolution in DPI of the plots to generate in this function (e.g. the finder chart, etc.)
 findercachedir (str) – The path to the astrobase cache directory for finder chart downloads from the NASA SkyView service.
 verbose (bool) – If True, will indicate progress and warn about potential problems.
Returns: Paths to the updated checkplot pickle file.
Return type: list of str
astrobase.lcproc.epd module¶
This contains functions to run External Parameter Decorrelation (EPD) on a large collection of light curves.

astrobase.lcproc.epd.
apply_epd_magseries
(lcfile, timecol, magcol, errcol, externalparams, lcformat='hatsql', lcformatdir=None, epdsmooth_sigclip=3.0, epdsmooth_windowsize=21, epdsmooth_func=<function smooth_magseries_savgol>, epdsmooth_extraparams=None)[source]¶ This applies external parameter decorrelation (EPD) to a light curve.
Parameters:  lcfile (str) – The filename of the light curve file to process.
 timecol,magcol,errcol (str) – The keys in the lcdict produced by your light curve reader function that correspond to the times, mags/fluxes, and associated measurement errors that will be used as input to the EPD process.
 externalparams (dict or None) –
This is a dict that indicates which keys in the lcdict obtained from the lcfile correspond to the required external parameters. As with timecol, magcol, and errcol, these can be simple keys (e.g. ‘rjd’) or compound keys (‘magaperture1.mags’). The dict should look something like:
{'fsv':'<lcdict key>' array: S values for each observation, 'fdv':'<lcdict key>' array: D values for each observation, 'fkv':'<lcdict key>' array: K values for each observation, 'xcc':'<lcdict key>' array: x coords for each observation, 'ycc':'<lcdict key>' array: y coords for each observation, 'bgv':'<lcdict key>' array: sky background for each observation, 'bge':'<lcdict key>' array: sky background err for each observation, 'iha':'<lcdict key>' array: hour angle for each observation, 'izd':'<lcdict key>' array: zenith distance for each observation}
Alternatively, if these exact keys are already present in the lcdict, indicate this by setting externalparams to None.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 epdsmooth_sigclip (float or int or sequence of two floats/ints or None) –
This specifies how to sigmaclip the input LC before fitting the EPD function to it.
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 epdsmooth_windowsize (int) – This is the number of LC points to smooth over to generate a smoothed light curve that will be used to fit the EPD function.
 epdsmooth_func (Python function) –
This sets the smoothing filter function to use. A SavitskyGolay filter is used to smooth the light curve by default. The functions that can be used with this kwarg are listed in varbase.trends. If you want to use your own function, it MUST have the following signature:
def smoothfunc(mags_array, window_size, **extraparams)
and return a numpy array of the same size as mags_array with the smoothed timeseries. Any extra params can be provided using the extraparams dict.
 epdsmooth_extraparams (dict) – This is a dict of any extra filter params to supply to the smoothing function.
Returns: Writes the output EPD light curve to a pickle that contains the lcdict with an added lcdict[‘epd’] key, which contains the EPD times, mags/fluxes, and errs as lcdict[‘epd’][‘times’], lcdict[‘epd’][‘mags’], and lcdict[‘epd’][‘errs’]. Returns the filename of this generated EPD LC pickle file.
Return type: str
Notes
 S > measure of PSF sharpness (~1/sigma^2 sosmaller S = wider PSF)
 D > measure of PSF ellipticity in xy direction
 K > measure of PSF ellipticity in cross direction
S, D, K are related to the PSF’s variance and covariance, see eqn 3033 in A. Pal’s thesis: https://arxiv.org/abs/0906.3486

astrobase.lcproc.epd.
parallel_epd_worker
(task)[source]¶ This is a parallel worker for the function below.
Parameters: task (tuple) –  task[0] = lcfile
 task[1] = timecol
 task[2] = magcol
 task[3] = errcol
 task[4] = externalparams
 task[5] = lcformat
 task[6] = lcformatdir
 task[7] = epdsmooth_sigclip
 task[8] = epdsmooth_windowsize
 task[9] = epdsmooth_func
 task[10] = epdsmooth_extraparams
Returns: If EPD succeeds for an input LC, returns the filename of the output EPD LC pickle file. If it fails, returns None. Return type: str or None

astrobase.lcproc.epd.
parallel_epd_lclist
(lclist, externalparams, timecols=None, magcols=None, errcols=None, lcformat='hatsql', lcformatdir=None, epdsmooth_sigclip=3.0, epdsmooth_windowsize=21, epdsmooth_func=<function smooth_magseries_savgol>, epdsmooth_extraparams=None, nworkers=2, maxworkertasks=1000)[source]¶ This applies EPD in parallel to all LCs in the input list.
Parameters:  lclist (list of str) – This is the list of light curve files to run EPD on.
 externalparams (dict or None) –
This is a dict that indicates which keys in the lcdict obtained from the lcfile correspond to the required external parameters. As with timecol, magcol, and errcol, these can be simple keys (e.g. ‘rjd’) or compound keys (‘magaperture1.mags’). The dict should look something like:
{'fsv':'<lcdict key>' array: S values for each observation, 'fdv':'<lcdict key>' array: D values for each observation, 'fkv':'<lcdict key>' array: K values for each observation, 'xcc':'<lcdict key>' array: x coords for each observation, 'ycc':'<lcdict key>' array: y coords for each observation, 'bgv':'<lcdict key>' array: sky background for each observation, 'bge':'<lcdict key>' array: sky background err for each observation, 'iha':'<lcdict key>' array: hour angle for each observation, 'izd':'<lcdict key>' array: zenith distance for each observation}
Alternatively, if these exact keys are already present in the lcdict, indicate this by setting externalparams to None.
 timecols,magcols,errcols (lists of str) – The keys in the lcdict produced by your light curve reader function that correspond to the times, mags/fluxes, and associated measurement errors that will be used as inputs to the EPD process. If these are None, the default values for timecols, magcols, and errcols for your light curve format will be used here.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curve files.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 epdsmooth_sigclip (float or int or sequence of two floats/ints or None) –
This specifies how to sigmaclip the input LC before fitting the EPD function to it.
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 epdsmooth_windowsize (int) – This is the number of LC points to smooth over to generate a smoothed light curve that will be used to fit the EPD function.
 epdsmooth_func (Python function) –
This sets the smoothing filter function to use. A SavitskyGolay filter is used to smooth the light curve by default. The functions that can be used with this kwarg are listed in varbase.trends. If you want to use your own function, it MUST have the following signature:
def smoothfunc(mags_array, window_size, **extraparams)
and return a numpy array of the same size as mags_array with the smoothed timeseries. Any extra params can be provided using the extraparams dict.
 epdsmooth_extraparams (dict) – This is a dict of any extra filter params to supply to the smoothing function.
 nworkers (int) – The number of parallel workers to launch when processing the LCs.
 maxworkertasks (int) – The maximum number of tasks a parallel worker will complete before it is replaced with a new one (sometimes helps with memoryleaks).
Returns: Returns a dict organized by all the keys in the input magcols list, containing lists of EPD pickle light curves for that magcol.
Return type: dict
Notes
 S > measure of PSF sharpness (~1/sigma^2 sosmaller S = wider PSF)
 D > measure of PSF ellipticity in xy direction
 K > measure of PSF ellipticity in cross direction
S, D, K are related to the PSF’s variance and covariance, see eqn 3033 in A. Pal’s thesis: https://arxiv.org/abs/0906.3486

astrobase.lcproc.epd.
parallel_epd_lcdir
(lcdir, externalparams, lcfileglob=None, timecols=None, magcols=None, errcols=None, lcformat='hatsql', lcformatdir=None, epdsmooth_sigclip=3.0, epdsmooth_windowsize=21, epdsmooth_func=<function smooth_magseries_savgol>, epdsmooth_extraparams=None, nworkers=2, maxworkertasks=1000)[source]¶ This applies EPD in parallel to all LCs in a directory.
Parameters:  lcdir (str) – The light curve directory to process.
 externalparams (dict or None) –
This is a dict that indicates which keys in the lcdict obtained from the lcfile correspond to the required external parameters. As with timecol, magcol, and errcol, these can be simple keys (e.g. ‘rjd’) or compound keys (‘magaperture1.mags’). The dict should look something like:
{'fsv':'<lcdict key>' array: S values for each observation, 'fdv':'<lcdict key>' array: D values for each observation, 'fkv':'<lcdict key>' array: K values for each observation, 'xcc':'<lcdict key>' array: x coords for each observation, 'ycc':'<lcdict key>' array: y coords for each observation, 'bgv':'<lcdict key>' array: sky background for each observation, 'bge':'<lcdict key>' array: sky background err for each observation, 'iha':'<lcdict key>' array: hour angle for each observation, 'izd':'<lcdict key>' array: zenith distance for each observation}
 lcfileglob (str or None) – A UNIX fileglob to use to select light curve files in lcdir. If this is not None, the value provided will override the default fileglob for your light curve format.
 timecols,magcols,errcols (lists of str) – The keys in the lcdict produced by your light curve reader function that correspond to the times, mags/fluxes, and associated measurement errors that will be used as inputs to the EPD process. If these are None, the default values for timecols, magcols, and errcols for your light curve format will be used here.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 epdsmooth_sigclip (float or int or sequence of two floats/ints or None) –
This specifies how to sigmaclip the input LC before fitting the EPD function to it.
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 epdsmooth_windowsize (int) – This is the number of LC points to smooth over to generate a smoothed light curve that will be used to fit the EPD function.
 epdsmooth_func (Python function) –
This sets the smoothing filter function to use. A SavitskyGolay filter is used to smooth the light curve by default. The functions that can be used with this kwarg are listed in varbase.trends. If you want to use your own function, it MUST have the following signature:
def smoothfunc(mags_array, window_size, **extraparams)
and return a numpy array of the same size as mags_array with the smoothed timeseries. Any extra params can be provided using the extraparams dict.
 epdsmooth_extraparams (dict) – This is a dict of any extra filter params to supply to the smoothing function.
 nworkers (int) – The number of parallel workers to launch when processing the LCs.
 maxworkertasks (int) – The maximum number of tasks a parallel worker will complete before it is replaced with a new one (sometimes helps with memoryleaks).
Returns: Returns a dict organized by all the keys in the input magcols list, containing lists of EPD pickle light curves for that magcol.
Return type: dict
Notes
 S > measure of PSF sharpness (~1/sigma^2 sosmaller S = wider PSF)
 D > measure of PSF ellipticity in xy direction
 K > measure of PSF ellipticity in cross direction
S, D, K are related to the PSF’s variance and covariance, see eqn 3033 in A. Pal’s thesis: https://arxiv.org/abs/0906.3486
astrobase.lcproc.lcbin module¶
This contains parallelized functions to bin large numbers of light curves in time.

astrobase.lcproc.lcbin.
timebinlc
(lcfile, binsizesec, outdir=None, lcformat='hatsql', lcformatdir=None, timecols=None, magcols=None, errcols=None, minbinelems=7)[source]¶ This bins the given light curve file in time using the specified bin size.
Parameters:  lcfile (str) – The file name to process.
 binsizesec (float) – The time binsize in seconds.
 outdir (str or None) – If this is a str, the output LC will be written to outdir. If this is None, the output LC will be written to the same directory as lcfile.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curve file.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 timecols,magcols,errcols (lists of str) – The keys in the lcdict produced by your light curve reader function that correspond to the times, mags/fluxes, and associated measurement errors that will be used as inputs to the binning process. If these are None, the default values for timecols, magcols, and errcols for your light curve format will be used here.
 minbinelems (int) – The minimum number of timebin elements required to accept a timebin as valid for the output binned light curve.
Returns: The name of the output pickle file with the binned LC.
Writes the output binned light curve to a pickle that contains the lcdict with an added lcdict[‘binned’][magcol] key, which contains the binned times, mags/fluxes, and errs as lcdict[‘binned’][magcol][‘times’], lcdict[‘binned’][magcol][‘mags’], and lcdict[‘epd’][magcol][‘errs’] for each magcol provided in the input or default magcols value for this light curve format.
Return type: str

astrobase.lcproc.lcbin.
timebinlc_worker
(task)[source]¶ This is a parallel worker for the function below.
Parameters: task (tuple) – This is of the form:
task[0] = lcfile task[1] = binsizesec task[3] = {'outdir','lcformat','lcformatdir', 'timecols','magcols','errcols','minbinelems'}
Returns: The output pickle file with the binned LC if successful. None otherwise. Return type: str

astrobase.lcproc.lcbin.
parallel_timebin
(lclist, binsizesec, maxobjects=None, outdir=None, lcformat='hatsql', lcformatdir=None, timecols=None, magcols=None, errcols=None, minbinelems=7, nworkers=2, maxworkertasks=1000)[source]¶ This timebins all the LCs in the list using the specified bin size.
Parameters:  lclist (list of str) – The input LCs to process.
 binsizesec (float) – The time bin size to use in seconds.
 maxobjects (int or None) – If provided, LC processing will stop at lclist[maxobjects].
 outdir (str or None) – The directory where output LCs will be written. If None, will write to the same directory as the input LCs.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curve file.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 timecols,magcols,errcols (lists of str) – The keys in the lcdict produced by your light curve reader function that correspond to the times, mags/fluxes, and associated measurement errors that will be used as inputs to the binning process. If these are None, the default values for timecols, magcols, and errcols for your light curve format will be used here.
 minbinelems (int) – The minimum number of timebin elements required to accept a timebin as valid for the output binned light curve.
 nworkers (int) – Number of parallel workers to launch.
 maxworkertasks (int) – The maximum number of tasks a parallel worker will complete before being replaced to guard against memory leaks.
Returns: The returned dict contains keys = input LCs, vals = output LCs.
Return type: dict

astrobase.lcproc.lcbin.
parallel_timebin_lcdir
(lcdir, binsizesec, maxobjects=None, outdir=None, lcformat='hatsql', lcformatdir=None, timecols=None, magcols=None, errcols=None, minbinelems=7, nworkers=2, maxworkertasks=1000)[source]¶ This time bins all the light curves in the specified directory.
Parameters:  lcdir (list of str) – Directory containing the input LCs to process.
 binsizesec (float) – The time bin size to use in seconds.
 maxobjects (int or None) – If provided, LC processing will stop at lclist[maxobjects].
 outdir (str or None) – The directory where output LCs will be written. If None, will write to the same directory as the input LCs.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curve file.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 timecols,magcols,errcols (lists of str) – The keys in the lcdict produced by your light curve reader function that correspond to the times, mags/fluxes, and associated measurement errors that will be used as inputs to the binning process. If these are None, the default values for timecols, magcols, and errcols for your light curve format will be used here.
 minbinelems (int) – The minimum number of timebin elements required to accept a timebin as valid for the output binned light curve.
 nworkers (int) – Number of parallel workers to launch.
 maxworkertasks (int) – The maximum number of tasks a parallel worker will complete before being replaced to guard against memory leaks.
Returns: The returned dict contains keys = input LCs, vals = output LCs.
Return type: dict
astrobase.lcproc.lcpfeatures module¶
This contains functions to generate periodic light curve features for later variable star classification.

astrobase.lcproc.lcpfeatures.
get_periodicfeatures
(pfpickle, lcbasedir, outdir, fourierorder=5, transitparams=(0.01, 0.1, 0.1), ebparams=(0.2, 0.3, 0.7, 0.5), pdiff_threshold=0.0001, sidereal_threshold=0.0001, sampling_peak_multiplier=5.0, sampling_startp=None, sampling_endp=None, starfeatures=None, timecols=None, magcols=None, errcols=None, lcformat='hatsql', lcformatdir=None, sigclip=10.0, verbose=True, raiseonfail=False)[source]¶ This gets all periodic features for the object.
Parameters:  pfpickle (str) – The periodfinding result pickle containing periodfinder results to use for the calculation of LC fit, periodogram, and phased LC features.
 lcbasedir (str) – The base directory where the light curve for the current object is located.
 outdir (str) – The output directory where the results will be written.
 fourierorder (int) – The Fourier order to use to generate sinusoidal function and fit that to the phased light curve.
 transitparams (list of floats) – The transit depth, duration, and ingress duration to use to generate a trapezoid planet transit model fit to the phased light curve. The period used is the one provided in period, while the epoch is automatically obtained from a spline fit to the phased light curve.
 ebparams (list of floats) – The primary eclipse depth, eclipse duration, the primarysecondary depth ratio, and the phase of the secondary eclipse to use to generate an eclipsing binary model fit to the phased light curve. The period used is the one provided in period, while the epoch is automatically obtained from a spline fit to the phased light curve.
 pdiff_threshold (float) – This is the max difference between periods to consider them the same.
 sidereal_threshold (float) – This is the max difference between any of the ‘best’ periods and the sidereal day periods to consider them the same.
 sampling_peak_multiplier (float) – This is the minimum multiplicative factor of a ‘best’ period’s normalized periodogram peak over the sampling periodogram peak at the same period required to accept the ‘best’ period as possibly real.
 sampling_endp (sampling_startp,) – If the pgramlist doesn’t have a timesampling LombScargle periodogram, it will be obtained automatically. Use these kwargs to control the minimum and maximum period interval to be searched when generating this periodogram.
 starfeatures (str or None) – If not None, this should be the filename of the
starfeatures<objectid>.pkl created by
astrobase.lcproc.lcsfeatures.get_starfeatures()
for this object. This is used to get the neighbor’s light curve and phase it with this object’s period to see if this object is blended.  timecols (list of str or None) – The timecol keys to use from the lcdict in calculating the features.
 magcols (list of str or None) – The magcol keys to use from the lcdict in calculating the features.
 errcols (list of str or None) – The errcol keys to use from the lcdict in calculating the features.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 verbose (bool) – If True, will indicate progress while working.
 raiseonfail (bool) – If True, will raise an Exception if something goes wrong.
Returns: Returns a filename for the output pickle containing all of the periodic features for the input object’s LC.
Return type: str

astrobase.lcproc.lcpfeatures.
serial_periodicfeatures
(pfpkl_list, lcbasedir, outdir, starfeaturesdir=None, fourierorder=5, transitparams=(0.01, 0.1, 0.1), ebparams=(0.2, 0.3, 0.7, 0.5), pdiff_threshold=0.0001, sidereal_threshold=0.0001, sampling_peak_multiplier=5.0, sampling_startp=None, sampling_endp=None, starfeatures=None, timecols=None, magcols=None, errcols=None, lcformat='hatsql', lcformatdir=None, sigclip=10.0, verbose=False, maxobjects=None)[source]¶ This drives the periodicfeatures collection for a list of periodfinding pickles.
Parameters:  pfpkl_list (list of str) – The list of periodfinding pickles to use.
 lcbasedir (str) – The base directory where the associated light curves are located.
 outdir (str) – The directory where the results will be written.
 starfeaturesdir (str or None) – The directory containing the starfeatures<objectid>.pkl files for each object to use calculate neighbor proximity light curve features.
 fourierorder (int) – The Fourier order to use to generate sinusoidal function and fit that to the phased light curve.
 transitparams (list of floats) – The transit depth, duration, and ingress duration to use to generate a trapezoid planet transit model fit to the phased light curve. The period used is the one provided in period, while the epoch is automatically obtained from a spline fit to the phased light curve.
 ebparams (list of floats) – The primary eclipse depth, eclipse duration, the primarysecondary depth ratio, and the phase of the secondary eclipse to use to generate an eclipsing binary model fit to the phased light curve. The period used is the one provided in period, while the epoch is automatically obtained from a spline fit to the phased light curve.
 pdiff_threshold (float) – This is the max difference between periods to consider them the same.
 sidereal_threshold (float) – This is the max difference between any of the ‘best’ periods and the sidereal day periods to consider them the same.
 sampling_peak_multiplier (float) – This is the minimum multiplicative factor of a ‘best’ period’s normalized periodogram peak over the sampling periodogram peak at the same period required to accept the ‘best’ period as possibly real.
 sampling_endp (sampling_startp,) – If the pgramlist doesn’t have a timesampling LombScargle periodogram, it will be obtained automatically. Use these kwargs to control the minimum and maximum period interval to be searched when generating this periodogram.
 timecols (list of str or None) – The timecol keys to use from the lcdict in calculating the features.
 magcols (list of str or None) – The magcol keys to use from the lcdict in calculating the features.
 errcols (list of str or None) – The errcol keys to use from the lcdict in calculating the features.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 verbose (bool) – If True, will indicate progress while working.
 maxobjects (int) – The total number of objects to process from pfpkl_list.
Returns: Return type: Nothing.

astrobase.lcproc.lcpfeatures.
parallel_periodicfeatures
(pfpkl_list, lcbasedir, outdir, starfeaturesdir=None, fourierorder=5, transitparams=(0.01, 0.1, 0.1), ebparams=(0.2, 0.3, 0.7, 0.5), pdiff_threshold=0.0001, sidereal_threshold=0.0001, sampling_peak_multiplier=5.0, sampling_startp=None, sampling_endp=None, timecols=None, magcols=None, errcols=None, lcformat='hatsql', lcformatdir=None, sigclip=10.0, verbose=False, maxobjects=None, nworkers=2)[source]¶ This runs periodic feature generation in parallel for all periodfinding pickles in the input list.
Parameters:  pfpkl_list (list of str) – The list of periodfinding pickles to use.
 lcbasedir (str) – The base directory where the associated light curves are located.
 outdir (str) – The directory where the results will be written.
 starfeaturesdir (str or None) – The directory containing the starfeatures<objectid>.pkl files for each object to use calculate neighbor proximity light curve features.
 fourierorder (int) – The Fourier order to use to generate sinusoidal function and fit that to the phased light curve.
 transitparams (list of floats) – The transit depth, duration, and ingress duration to use to generate a trapezoid planet transit model fit to the phased light curve. The period used is the one provided in period, while the epoch is automatically obtained from a spline fit to the phased light curve.
 ebparams (list of floats) – The primary eclipse depth, eclipse duration, the primarysecondary depth ratio, and the phase of the secondary eclipse to use to generate an eclipsing binary model fit to the phased light curve. The period used is the one provided in period, while the epoch is automatically obtained from a spline fit to the phased light curve.
 pdiff_threshold (float) – This is the max difference between periods to consider them the same.
 sidereal_threshold (float) – This is the max difference between any of the ‘best’ periods and the sidereal day periods to consider them the same.
 sampling_peak_multiplier (float) – This is the minimum multiplicative factor of a ‘best’ period’s normalized periodogram peak over the sampling periodogram peak at the same period required to accept the ‘best’ period as possibly real.
 sampling_endp (sampling_startp,) – If the pgramlist doesn’t have a timesampling LombScargle periodogram, it will be obtained automatically. Use these kwargs to control the minimum and maximum period interval to be searched when generating this periodogram.
 timecols (list of str or None) – The timecol keys to use from the lcdict in calculating the features.
 magcols (list of str or None) – The magcol keys to use from the lcdict in calculating the features.
 errcols (list of str or None) – The errcol keys to use from the lcdict in calculating the features.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 verbose (bool) – If True, will indicate progress while working.
 maxobjects (int) – The total number of objects to process from pfpkl_list.
 nworkers (int) – The number of parallel workers to launch to process the input.
Returns: A dict containing key: val pairs of the input periodfinder result and the output periodic feature result pickles for each input pickle is returned.
Return type: dict

astrobase.lcproc.lcpfeatures.
parallel_periodicfeatures_lcdir
(pfpkl_dir, lcbasedir, outdir, pfpkl_glob='periodfinding*.pkl*', starfeaturesdir=None, fourierorder=5, transitparams=(0.01, 0.1, 0.1), ebparams=(0.2, 0.3, 0.7, 0.5), pdiff_threshold=0.0001, sidereal_threshold=0.0001, sampling_peak_multiplier=5.0, sampling_startp=None, sampling_endp=None, timecols=None, magcols=None, errcols=None, lcformat='hatsql', lcformatdir=None, sigclip=10.0, verbose=False, maxobjects=None, nworkers=2, recursive=True)[source]¶ This runs parallel periodicfeature extraction for a directory of periodfinding result pickles.
Parameters:  pfpkl_dir (str) – The directory containing the pickles to process.
 lcbasedir (str) – The directory where all of the associated light curve files are located.
 outdir (str) – The directory where all the output will be written.
 pfpkl_glob (str) – The UNIX file glob to use to search for periodfinder result pickles in pfpkl_dir.
 starfeaturesdir (str or None) – The directory containing the starfeatures<objectid>.pkl files for each object to use calculate neighbor proximity light curve features.
 fourierorder (int) – The Fourier order to use to generate sinusoidal function and fit that to the phased light curve.
 transitparams (list of floats) – The transit depth, duration, and ingress duration to use to generate a trapezoid planet transit model fit to the phased light curve. The period used is the one provided in period, while the epoch is automatically obtained from a spline fit to the phased light curve.
 ebparams (list of floats) – The primary eclipse depth, eclipse duration, the primarysecondary depth ratio, and the phase of the secondary eclipse to use to generate an eclipsing binary model fit to the phased light curve. The period used is the one provided in period, while the epoch is automatically obtained from a spline fit to the phased light curve.
 pdiff_threshold (float) – This is the max difference between periods to consider them the same.
 sidereal_threshold (float) – This is the max difference between any of the ‘best’ periods and the sidereal day periods to consider them the same.
 sampling_peak_multiplier (float) – This is the minimum multiplicative factor of a ‘best’ period’s normalized periodogram peak over the sampling periodogram peak at the same period required to accept the ‘best’ period as possibly real.
 sampling_endp (sampling_startp,) – If the pgramlist doesn’t have a timesampling LombScargle periodogram, it will be obtained automatically. Use these kwargs to control the minimum and maximum period interval to be searched when generating this periodogram.
 timecols (list of str or None) – The timecol keys to use from the lcdict in calculating the features.
 magcols (list of str or None) – The magcol keys to use from the lcdict in calculating the features.
 errcols (list of str or None) – The errcol keys to use from the lcdict in calculating the features.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 verbose (bool) – If True, will indicate progress while working.
 maxobjects (int) – The total number of objects to process from pfpkl_list.
 nworkers (int) – The number of parallel workers to launch to process the input.
Returns: A dict containing key: val pairs of the input periodfinder result and the output periodic feature result pickles for each input pickle is returned.
Return type: dict
astrobase.lcproc.lcsfeatures module¶
This contains functions to obtain various star magnitude and color features for large numbers of light curves. Useful later for variable star classification.

astrobase.lcproc.lcsfeatures.
get_starfeatures
(lcfile, outdir, kdtree, objlist, lcflist, neighbor_radius_arcsec, deredden=True, custom_bandpasses=None, lcformat='hatsql', lcformatdir=None)[source]¶ This runs the functions from
astrobase.varclass.starfeatures()
on a single light curve file.Parameters:  lcfile (str) – This is the LC file to extract star features for.
 outdir (str) – This is the directory to write the output pickle to.
 kdtree (scipy.spatial.cKDTree) – This is a scipy.spatial.KDTree or cKDTree used to calculate neighbor proximity features. This is for the light curve catalog this object is in.
 objlist (np.array) – This is a Numpy array of object IDs in the same order as the kdtree.data np.array. This is for the light curve catalog this object is in.
 lcflist (np.array) – This is a Numpy array of light curve filenames in the same order as kdtree.data. This is for the light curve catalog this object is in.
 neighbor_radius_arcsec (float) – This indicates the radius in arcsec to search for neighbors for this object using the light curve catalog’s kdtree, objlist, lcflist, and in GAIA.
 deredden (bool) – This controls if the colors and any color classifications will be dereddened using 2MASS DUST.
 custom_bandpasses (dict or None) –
This is a dict used to define any custom bandpasses in the in_objectinfo dict you want to make this function aware of and generate colors for. Use the format below for this dict:
{ '<bandpass_key_1>':{'dustkey':'<twomass_dust_key_1>', 'label':'<band_label_1>' 'colors':[['<bandkey1><bandkey2>', '<BAND1>  <BAND2>'], ['<bandkey3><bandkey4>', '<BAND3>  <BAND4>']]}, . ... . '<bandpass_key_N>':{'dustkey':'<twomass_dust_key_N>', 'label':'<band_label_N>' 'colors':[['<bandkey1><bandkey2>', '<BAND1>  <BAND2>'], ['<bandkey3><bandkey4>', '<BAND3>  <BAND4>']]}, }
Where:
bandpass_key is a key to use to refer to this bandpass in the objectinfo dict, e.g. ‘sdssg’ for SDSS g band
twomass_dust_key is the key to use in the 2MASS DUST result table for reddening per bandpass. For example, given the following DUST result table (using http://irsa.ipac.caltech.edu/applications/DUST/):
Filter_nameLamEff A_over_E_B_V_SandFA_SandFA_over_E_B_V_SFDA_SFD char float float float float float  microns mags  mags  CTIO U 0.3734 4.107 0.209 4.968 0.253 CTIO B 0.4309 3.641 0.186 4.325 0.221 CTIO V 0.5517 2.682 0.137 3.240 0.165 . . ...
The twomass_dust_key for ‘vmag’ would be ‘CTIO V’. If you want to skip DUST lookup and want to pass in a specific reddening magnitude for your bandpass, use a float for the value of twomass_dust_key. If you want to skip DUST lookup entirely for this bandpass, use None for the value of twomass_dust_key.
band_label is the label to use for this bandpass, e.g. ‘W1’ for WISE1 band, ‘u’ for SDSS u, etc.
The ‘colors’ list contains color definitions for all colors you want to generate using this bandpass. this list contains elements of the form:
['<bandkey1><bandkey2>','<BAND1>  <BAND2>']
where the the first item is the bandpass keys making up this color, and the second item is the label for this color to be used by the frontends. An example:
['sdssusdssg','u  g']
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
Returns: Path to the output pickle containing all of the star features for this object.
Return type: str

astrobase.lcproc.lcsfeatures.
serial_starfeatures
(lclist, outdir, lc_catalog_pickle, neighbor_radius_arcsec, maxobjects=None, deredden=True, custom_bandpasses=None, lcformat='hatsql', lcformatdir=None)[source]¶ This drives the get_starfeatures function for a collection of LCs.
Parameters:  lclist (list of str) – The list of light curve file names to process.
 outdir (str) – The output directory where the results will be placed.
 lc_catalog_pickle (str) –
The path to a catalog containing at a dict with least:
 an object ID array accessible with dict[‘objects’][‘objectid’]
 an LC filename array accessible with dict[‘objects’][‘lcfname’]
 a scipy.spatial.KDTree or cKDTree object to use for finding neighbors for each object accessible with dict[‘kdtree’]
A catalog pickle of the form needed can be produced using
astrobase.lcproc.catalogs.make_lclist()
orastrobase.lcproc.catalogs.filter_lclist()
.  neighbor_radius_arcsec (float) – This indicates the radius in arcsec to search for neighbors for this object using the light curve catalog’s kdtree, objlist, lcflist, and in GAIA.
 maxobjects (int) – The number of objects to process from lclist.
 deredden (bool) – This controls if the colors and any color classifications will be dereddened using 2MASS DUST.
 custom_bandpasses (dict or None) –
This is a dict used to define any custom bandpasses in the in_objectinfo dict you want to make this function aware of and generate colors for. Use the format below for this dict:
{ '<bandpass_key_1>':{'dustkey':'<twomass_dust_key_1>', 'label':'<band_label_1>' 'colors':[['<bandkey1><bandkey2>', '<BAND1>  <BAND2>'], ['<bandkey3><bandkey4>', '<BAND3>  <BAND4>']]}, . ... . '<bandpass_key_N>':{'dustkey':'<twomass_dust_key_N>', 'label':'<band_label_N>' 'colors':[['<bandkey1><bandkey2>', '<BAND1>  <BAND2>'], ['<bandkey3><bandkey4>', '<BAND3>  <BAND4>']]}, }
Where:
bandpass_key is a key to use to refer to this bandpass in the objectinfo dict, e.g. ‘sdssg’ for SDSS g band
twomass_dust_key is the key to use in the 2MASS DUST result table for reddening per bandpass. For example, given the following DUST result table (using http://irsa.ipac.caltech.edu/applications/DUST/):
Filter_nameLamEff A_over_E_B_V_SandFA_SandFA_over_E_B_V_SFDA_SFD char float float float float float  microns mags  mags  CTIO U 0.3734 4.107 0.209 4.968 0.253 CTIO B 0.4309 3.641 0.186 4.325 0.221 CTIO V 0.5517 2.682 0.137 3.240 0.165 . . ...
The twomass_dust_key for ‘vmag’ would be ‘CTIO V’. If you want to skip DUST lookup and want to pass in a specific reddening magnitude for your bandpass, use a float for the value of twomass_dust_key. If you want to skip DUST lookup entirely for this bandpass, use None for the value of twomass_dust_key.
band_label is the label to use for this bandpass, e.g. ‘W1’ for WISE1 band, ‘u’ for SDSS u, etc.
The ‘colors’ list contains color definitions for all colors you want to generate using this bandpass. this list contains elements of the form:
['<bandkey1><bandkey2>','<BAND1>  <BAND2>']
where the the first item is the bandpass keys making up this color, and the second item is the label for this color to be used by the frontends. An example:
['sdssusdssg','u  g']
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
Returns: A list of all star features pickles produced.
Return type: list of str

astrobase.lcproc.lcsfeatures.
parallel_starfeatures
(lclist, outdir, lc_catalog_pickle, neighbor_radius_arcsec, maxobjects=None, deredden=True, custom_bandpasses=None, lcformat='hatsql', lcformatdir=None, nworkers=2)[source]¶ This runs get_starfeatures in parallel for all light curves in lclist.
Parameters:  lclist (list of str) – The list of light curve file names to process.
 outdir (str) – The output directory where the results will be placed.
 lc_catalog_pickle (str) –
The path to a catalog containing at a dict with least:
 an object ID array accessible with dict[‘objects’][‘objectid’]
 an LC filename array accessible with dict[‘objects’][‘lcfname’]
 a scipy.spatial.KDTree or cKDTree object to use for finding neighbors for each object accessible with dict[‘kdtree’]
A catalog pickle of the form needed can be produced using
astrobase.lcproc.catalogs.make_lclist()
orastrobase.lcproc.catalogs.filter_lclist()
.  neighbor_radius_arcsec (float) – This indicates the radius in arcsec to search for neighbors for this object using the light curve catalog’s kdtree, objlist, lcflist, and in GAIA.
 maxobjects (int) – The number of objects to process from lclist.
 deredden (bool) – This controls if the colors and any color classifications will be dereddened using 2MASS DUST.
 custom_bandpasses (dict or None) –
This is a dict used to define any custom bandpasses in the in_objectinfo dict you want to make this function aware of and generate colors for. Use the format below for this dict:
{ '<bandpass_key_1>':{'dustkey':'<twomass_dust_key_1>', 'label':'<band_label_1>' 'colors':[['<bandkey1><bandkey2>', '<BAND1>  <BAND2>'], ['<bandkey3><bandkey4>', '<BAND3>  <BAND4>']]}, . ... . '<bandpass_key_N>':{'dustkey':'<twomass_dust_key_N>', 'label':'<band_label_N>' 'colors':[['<bandkey1><bandkey2>', '<BAND1>  <BAND2>'], ['<bandkey3><bandkey4>', '<BAND3>  <BAND4>']]}, }
Where:
bandpass_key is a key to use to refer to this bandpass in the objectinfo dict, e.g. ‘sdssg’ for SDSS g band
twomass_dust_key is the key to use in the 2MASS DUST result table for reddening per bandpass. For example, given the following DUST result table (using http://irsa.ipac.caltech.edu/applications/DUST/):
Filter_nameLamEff A_over_E_B_V_SandFA_SandFA_over_E_B_V_SFDA_SFD char float float float float float  microns mags  mags  CTIO U 0.3734 4.107 0.209 4.968 0.253 CTIO B 0.4309 3.641 0.186 4.325 0.221 CTIO V 0.5517 2.682 0.137 3.240 0.165 . . ...
The twomass_dust_key for ‘vmag’ would be ‘CTIO V’. If you want to skip DUST lookup and want to pass in a specific reddening magnitude for your bandpass, use a float for the value of twomass_dust_key. If you want to skip DUST lookup entirely for this bandpass, use None for the value of twomass_dust_key.
band_label is the label to use for this bandpass, e.g. ‘W1’ for WISE1 band, ‘u’ for SDSS u, etc.
The ‘colors’ list contains color definitions for all colors you want to generate using this bandpass. this list contains elements of the form:
['<bandkey1><bandkey2>','<BAND1>  <BAND2>']
where the the first item is the bandpass keys making up this color, and the second item is the label for this color to be used by the frontends. An example:
['sdssusdssg','u  g']
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 nworkers (int) – The number of parallel workers to launch.
Returns: A dict with key:val pairs of the input light curve filename and the output star features pickle for each LC processed.
Return type: dict

astrobase.lcproc.lcsfeatures.
parallel_starfeatures_lcdir
(lcdir, outdir, lc_catalog_pickle, neighbor_radius_arcsec, fileglob=None, maxobjects=None, deredden=True, custom_bandpasses=None, lcformat='hatsql', lcformatdir=None, nworkers=2, recursive=True)[source]¶ This runs parallel star feature extraction for a directory of LCs.
Parameters:  lcdir (list of str) – The directory to search for light curves.
 outdir (str) – The output directory where the results will be placed.
 lc_catalog_pickle (str) –
The path to a catalog containing at a dict with least:
 an object ID array accessible with dict[‘objects’][‘objectid’]
 an LC filename array accessible with dict[‘objects’][‘lcfname’]
 a scipy.spatial.KDTree or cKDTree object to use for finding neighbors for each object accessible with dict[‘kdtree’]
A catalog pickle of the form needed can be produced using
astrobase.lcproc.catalogs.make_lclist()
orastrobase.lcproc.catalogs.filter_lclist()
.  neighbor_radius_arcsec (float) – This indicates the radius in arcsec to search for neighbors for this object using the light curve catalog’s kdtree, objlist, lcflist, and in GAIA.
 fileglob (str) – The UNIX file glob to use to search for the light curves in lcdir. If None, the default value for the light curve format specified will be used.
 maxobjects (int) – The number of objects to process from lclist.
 deredden (bool) – This controls if the colors and any color classifications will be dereddened using 2MASS DUST.
 custom_bandpasses (dict or None) –
This is a dict used to define any custom bandpasses in the in_objectinfo dict you want to make this function aware of and generate colors for. Use the format below for this dict:
{ '<bandpass_key_1>':{'dustkey':'<twomass_dust_key_1>', 'label':'<band_label_1>' 'colors':[['<bandkey1><bandkey2>', '<BAND1>  <BAND2>'], ['<bandkey3><bandkey4>', '<BAND3>  <BAND4>']]}, . ... . '<bandpass_key_N>':{'dustkey':'<twomass_dust_key_N>', 'label':'<band_label_N>' 'colors':[['<bandkey1><bandkey2>', '<BAND1>  <BAND2>'], ['<bandkey3><bandkey4>', '<BAND3>  <BAND4>']]}, }
Where:
bandpass_key is a key to use to refer to this bandpass in the objectinfo dict, e.g. ‘sdssg’ for SDSS g band
twomass_dust_key is the key to use in the 2MASS DUST result table for reddening per bandpass. For example, given the following DUST result table (using http://irsa.ipac.caltech.edu/applications/DUST/):
Filter_nameLamEff A_over_E_B_V_SandFA_SandFA_over_E_B_V_SFDA_SFD char float float float float float  microns mags  mags  CTIO U 0.3734 4.107 0.209 4.968 0.253 CTIO B 0.4309 3.641 0.186 4.325 0.221 CTIO V 0.5517 2.682 0.137 3.240 0.165 . . ...
The twomass_dust_key for ‘vmag’ would be ‘CTIO V’. If you want to skip DUST lookup and want to pass in a specific reddening magnitude for your bandpass, use a float for the value of twomass_dust_key. If you want to skip DUST lookup entirely for this bandpass, use None for the value of twomass_dust_key.
band_label is the label to use for this bandpass, e.g. ‘W1’ for WISE1 band, ‘u’ for SDSS u, etc.
The ‘colors’ list contains color definitions for all colors you want to generate using this bandpass. this list contains elements of the form:
['<bandkey1><bandkey2>','<BAND1>  <BAND2>']
where the the first item is the bandpass keys making up this color, and the second item is the label for this color to be used by the frontends. An example:
['sdssusdssg','u  g']
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 nworkers (int) – The number of parallel workers to launch.
Returns: A dict with key:val pairs of the input light curve filename and the output star features pickle for each LC processed.
Return type: dict
astrobase.lcproc.lcvfeatures module¶
This contains functions to generate variability features for large collections of light curves. Useful later for variable star classification.

astrobase.lcproc.lcvfeatures.
get_varfeatures
(lcfile, outdir, timecols=None, magcols=None, errcols=None, mindet=1000, lcformat='hatsql', lcformatdir=None)[source]¶ This runs
astrobase.varclass.varfeatures.all_nonperiodic_features()
on a single LC file.Parameters:  lcfile (str) – The input light curve to process.
 outfile (str) – The filename of the output variable features pickle that will be generated.
 timecols (list of str or None) – The timecol keys to use from the lcdict in calculating the features.
 magcols (list of str or None) – The magcol keys to use from the lcdict in calculating the features.
 errcols (list of str or None) – The errcol keys to use from the lcdict in calculating the features.
 mindet (int) – The minimum number of LC points required to generate variability features.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
Returns: The generated variability features pickle for the input LC, with results for each magcol in the input magcol or light curve format’s default magcol list.
Return type: str

astrobase.lcproc.lcvfeatures.
serial_varfeatures
(lclist, outdir, maxobjects=None, timecols=None, magcols=None, errcols=None, mindet=1000, lcformat='hatsql', lcformatdir=None)[source]¶ This runs variability feature extraction for a list of LCs.
Parameters:  lclist (list of str) – The list of light curve file names to process.
 outdir (str) – The directory where the output varfeatures pickle files will be written.
 maxobjects (int) – The number of LCs to process from lclist.
 timecols (list of str or None) – The timecol keys to use from the lcdict in calculating the features.
 magcols (list of str or None) – The magcol keys to use from the lcdict in calculating the features.
 errcols (list of str or None) – The errcol keys to use from the lcdict in calculating the features.
 mindet (int) – The minimum number of LC points required to generate variability features.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
Returns: List of the generated variability features pickles for the input LCs, with results for each magcol in the input magcol or light curve format’s default magcol list.
Return type: list of str

astrobase.lcproc.lcvfeatures.
parallel_varfeatures
(lclist, outdir, maxobjects=None, timecols=None, magcols=None, errcols=None, mindet=1000, lcformat='hatsql', lcformatdir=None, nworkers=2)[source]¶ This runs variable feature extraction in parallel for all LCs in lclist.
Parameters:  lclist (list of str) – The list of light curve file names to process.
 outdir (str) – The directory where the output varfeatures pickle files will be written.
 maxobjects (int) – The number of LCs to process from lclist.
 timecols (list of str or None) – The timecol keys to use from the lcdict in calculating the features.
 magcols (list of str or None) – The magcol keys to use from the lcdict in calculating the features.
 errcols (list of str or None) – The errcol keys to use from the lcdict in calculating the features.
 mindet (int) – The minimum number of LC points required to generate variability features.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 nworkers (int) – The number of parallel workers to launch.
Returns: A dict with key:val pairs of input LC file name : the generated variability features pickles for each of the input LCs, with results for each magcol in the input magcol or light curve format’s default magcol list.
Return type: dict

astrobase.lcproc.lcvfeatures.
parallel_varfeatures_lcdir
(lcdir, outdir, fileglob=None, maxobjects=None, timecols=None, magcols=None, errcols=None, recursive=True, mindet=1000, lcformat='hatsql', lcformatdir=None, nworkers=2)[source]¶ This runs parallel variable feature extraction for a directory of LCs.
Parameters:  lcdir (str) – The directory of light curve files to process.
 outdir (str) – The directory where the output varfeatures pickle files will be written.
 fileglob (str or None) – The file glob to use when looking for light curve files in lcdir. If None, the default file glob associated for this LC format will be used.
 maxobjects (int) – The number of LCs to process from lclist.
 timecols (list of str or None) – The timecol keys to use from the lcdict in calculating the features.
 magcols (list of str or None) – The magcol keys to use from the lcdict in calculating the features.
 errcols (list of str or None) – The errcol keys to use from the lcdict in calculating the features.
 mindet (int) – The minimum number of LC points required to generate variability features.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 nworkers (int) – The number of parallel workers to launch.
Returns: A dict with key:val pairs of input LC file name : the generated variability features pickles for each of the input LCs, with results for each magcol in the input magcol or light curve format’s default magcol list.
Return type: dict
astrobase.lcproc.periodsearch module¶
This contains functions to run periodfinding in a parallelized manner on large collections of light curves.

astrobase.lcproc.periodsearch.
runpf
(lcfile, outdir, timecols=None, magcols=None, errcols=None, lcformat='hatsql', lcformatdir=None, pfmethods=('gls', 'pdm', 'mav', 'win'), pfkwargs=({}, {}, {}, {}), sigclip=10.0, getblssnr=False, nworkers=2, minobservations=500, excludeprocessed=False, raiseonfail=False)[source]¶ This runs the periodfinding for a single LC.
Parameters:  lcfile (str) – The light curve file to run periodfinding on.
 outdir (str) – The output directory where the result pickle will go.
 timecols (list of str or None) – The timecol keys to use from the lcdict in calculating the features.
 magcols (list of str or None) – The magcol keys to use from the lcdict in calculating the features.
 errcols (list of str or None) – The errcol keys to use from the lcdict in calculating the features.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 pfmethods (list of str) – This is a list of period finding methods to run. Each element is a string matching the keys of the PFMETHODS dict above. By default, this runs GLS, PDM, AoVMH, and the spectral window LombScargle periodogram.
 pfkwargs (list of dicts) – This is used to provide any special kwargs as dicts to each periodfinding method function specified in pfmethods.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 getblssnr (bool) – If this is True and BLS is one of the methods specified in pfmethods, will also calculate the stats for each best period in the BLS results: transit depth, duration, ingress duration, refit period and epoch, and the SNR of the transit.
 nworkers (int) – The number of parallel periodfinding workers to launch.
 minobservations (int) – The minimum number of finite LC points required to process a light curve.
 excludeprocessed (bool) –
If this is True, light curves that have existing periodfinding result pickles in outdir will not be processed.
FIXME: currently, this uses a dumb method of excluding alreadyprocessed files. A smarter way to do this is to (i) generate a SHA512 cachekey based on a repr of {‘lcfile’, ‘timecols’, ‘magcols’, ‘errcols’, ‘lcformat’, ‘pfmethods’, ‘sigclip’, ‘getblssnr’, ‘pfkwargs’}, (ii) make sure all list kwargs in the dict are sorted, (iii) check if the output file has the same cachekey in its filename (last 8 chars of cachekey should work), so the result was processed in exactly the same way as specifed in the input to this function, and can therefore be ignored. Will implement this later.
 raiseonfail (bool) – If something fails and this is True, will raise an Exception instead of returning None at the end.
Returns: The path to the output periodfinding result pickle.
Return type: str

astrobase.lcproc.periodsearch.
parallel_pf
(lclist, outdir, timecols=None, magcols=None, errcols=None, lcformat='hatsql', lcformatdir=None, pfmethods=('gls', 'pdm', 'mav', 'win'), pfkwargs=({}, {}, {}, {}), sigclip=10.0, getblssnr=False, nperiodworkers=2, ncontrolworkers=1, liststartindex=None, listmaxobjects=None, minobservations=500, excludeprocessed=True)[source]¶ This drives the overall parallel period processing for a list of LCs.
As a rough benchmark, 25000 HATNet light curves with up to 50000 points per LC take about 26 days in total for an invocation of this function using GLS+PDM+BLS, 10 periodworkers, and 4 controlworkers (so all 40 ‘cores’) on a 2 x Xeon E52660v3 machine.
Parameters:  lclist (list of str) – The list of light curve file to process.
 outdir (str) – The output directory where the periodfinding result pickles will go.
 timecols (list of str or None) – The timecol keys to use from the lcdict in calculating the features.
 magcols (list of str or None) – The magcol keys to use from the lcdict in calculating the features.
 errcols (list of str or None) – The errcol keys to use from the lcdict in calculating the features.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 pfmethods (list of str) – This is a list of period finding methods to run. Each element is a string matching the keys of the PFMETHODS dict above. By default, this runs GLS, PDM, AoVMH, and the spectral window LombScargle periodogram.
 pfkwargs (list of dicts) – This is used to provide any special kwargs as dicts to each periodfinding method function specified in pfmethods.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 getblssnr (bool) – If this is True and BLS is one of the methods specified in pfmethods, will also calculate the stats for each best period in the BLS results: transit depth, duration, ingress duration, refit period and epoch, and the SNR of the transit.
 nperiodworkers (int) – The number of parallel periodfinding workers to launch per object task.
 ncontrolworkers (int) – The number of controlling processes to launch. This effectively sets how many objects from lclist will be processed in parallel.
 liststartindex (int or None) – This sets the index from where to start in lclist.
 listmaxobjects (int or None) – This sets the maximum number of objects in lclist to run periodfinding for in this invocation. Together with liststartindex, listmaxobjects can be used to distribute processing over several independent machines if the number of light curves is very large.
 minobservations (int) – The minimum number of finite LC points required to process a light curve.
 excludeprocessed (bool) –
If this is True, light curves that have existing periodfinding result pickles in outdir will not be processed.
FIXME: currently, this uses a dumb method of excluding alreadyprocessed files. A smarter way to do this is to (i) generate a SHA512 cachekey based on a repr of {‘lcfile’, ‘timecols’, ‘magcols’, ‘errcols’, ‘lcformat’, ‘pfmethods’, ‘sigclip’, ‘getblssnr’, ‘pfkwargs’}, (ii) make sure all list kwargs in the dict are sorted, (iii) check if the output file has the same cachekey in its filename (last 8 chars of cachekey should work), so the result was processed in exactly the same way as specifed in the input to this function, and can therefore be ignored. Will implement this later.
Returns: A list of the periodfinding pickles created for all of input LCs processed.
Return type: list of str

astrobase.lcproc.periodsearch.
parallel_pf_lcdir
(lcdir, outdir, fileglob=None, recursive=True, timecols=None, magcols=None, errcols=None, lcformat='hatsql', lcformatdir=None, pfmethods=('gls', 'pdm', 'mav', 'win'), pfkwargs=({}, {}, {}, {}), sigclip=10.0, getblssnr=False, nperiodworkers=2, ncontrolworkers=1, liststartindex=None, listmaxobjects=None, minobservations=500, excludeprocessed=True)[source]¶ This runs parallel light curve period finding for directory of LCs.
Parameters:  lcdir (str) – The directory containing the LCs to process.
 outdir (str) – The directory where the resulting periodfinding pickles will go.
 fileglob (str or None) – The UNIX file glob to use to search for LCs in lcdir. If None, the default file glob associated with the registered LC format will be used instead.
 recursive (bool) – If True, will search recursively in lcdir for light curves to process.
 timecols (list of str or None) – The timecol keys to use from the lcdict in calculating the features.
 magcols (list of str or None) – The magcol keys to use from the lcdict in calculating the features.
 errcols (list of str or None) – The errcol keys to use from the lcdict in calculating the features.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 pfmethods (list of str) – This is a list of period finding methods to run. Each element is a string matching the keys of the PFMETHODS dict above. By default, this runs GLS, PDM, AoVMH, and the spectral window LombScargle periodogram.
 pfkwargs (list of dicts) – This is used to provide any special kwargs as dicts to each periodfinding method function specified in pfmethods.
 sigclip (float or int or sequence of two floats/ints or None) –
If a single float or int, a symmetric sigmaclip will be performed using the number provided as the sigmamultiplier to cut out from the input timeseries.
If a list of two ints/floats is provided, the function will perform an ‘asymmetric’ sigmaclip. The first element in this list is the sigma value to use for fainter flux/mag values; the second element in this list is the sigma value to use for brighter flux/mag values. For example, sigclip=[10., 3.], will sigclip out greater than 10sigma dimmings and greater than 3sigma brightenings. Here the meaning of “dimming” and “brightening” is set by physics (not the magnitude system), which is why the magsarefluxes kwarg must be correctly set.
If sigclip is None, no sigmaclipping will be performed, and the timeseries (with nonfinite elems removed) will be passed through to the output.
 getblssnr (bool) – If this is True and BLS is one of the methods specified in pfmethods, will also calculate the stats for each best period in the BLS results: transit depth, duration, ingress duration, refit period and epoch, and the SNR of the transit.
 nperiodworkers (int) – The number of parallel periodfinding workers to launch per object task.
 ncontrolworkers (int) – The number of controlling processes to launch. This effectively sets how many objects from lclist will be processed in parallel.
 liststartindex (int or None) – This sets the index from where to start in lclist.
 listmaxobjects (int or None) – This sets the maximum number of objects in lclist to run periodfinding for in this invocation. Together with liststartindex, listmaxobjects can be used to distribute processing over several independent machines if the number of light curves is very large.
 minobservations (int) – The minimum number of finite LC points required to process a light curve.
 excludeprocessed (bool) –
If this is True, light curves that have existing periodfinding result pickles in outdir will not be processed.
FIXME: currently, this uses a dumb method of excluding alreadyprocessed files. A smarter way to do this is to (i) generate a SHA512 cachekey based on a repr of {‘lcfile’, ‘timecols’, ‘magcols’, ‘errcols’, ‘lcformat’, ‘pfmethods’, ‘sigclip’, ‘getblssnr’, ‘pfkwargs’}, (ii) make sure all list kwargs in the dict are sorted, (iii) check if the output file has the same cachekey in its filename (last 8 chars of cachekey should work), so the result was processed in exactly the same way as specifed in the input to this function, and can therefore be ignored. Will implement this later.
Returns: A list of the periodfinding pickles created for all of input LCs processed.
Return type: list of str
astrobase.lcproc.tfa module¶
This contains functions to run the Trend Filtering Algorithm (TFA) in a parallelized manner on large collections of light curves.

astrobase.lcproc.tfa.
tfa_templates_lclist
(lclist, outfile, lcinfo_pkl=None, target_template_frac=0.1, max_target_frac_obs=0.25, min_template_number=10, max_template_number=1000, max_rms=0.15, max_mult_above_magmad=1.5, max_mult_above_mageta=1.5, mag_bandpass='sdssr', custom_bandpasses=None, mag_bright_limit=10.0, mag_faint_limit=12.0, process_template_lcs=True, template_sigclip=5.0, template_interpolate='nearest', lcformat='hatsql', lcformatdir=None, timecols=None, magcols=None, errcols=None, nworkers=2, maxworkertasks=1000)[source]¶ This selects template objects for TFA.
Selection criteria for TFA template ensemble objects:
 not variable: use a poly fit to the magMAD relation and etanormal variability index to get nonvar objects
 not more than 10% of the total number of objects in the field or max_tfa_templates at most and no more than max_target_frac_obs x template_ndet objects.
 allow shuffling of the templates if the target ends up in them
 nothing with less than the median number of observations in the field
 sigmaclip the input time series observations
 TODO: select randomly in xieta space. This doesn’t seem to make a huge difference at the moment, so removed those bits for now. This function makes plots of xieta for the selected template objects so the distributions can be visualized.
This also determines the effective cadence that all TFA LCs will be binned to as the template LC with the largest number of nonnan observations will be used. All template LCs will be renormed to zero.
Parameters:  lclist (list of str) – This is a list of light curves to use as input to generate the template set.
 outfile (str) – This is the pickle filename to which the TFA template list will be written to.
 lcinfo_pkl (str or None) – If provided, is a file path to a pickle file created by this function on a previous run containing the LC information. This will be loaded directly instead of having to rerun LC info collection. If None, will be placed in the same directory as outfile.
 target_template_frac (float) – This is the fraction of total objects in lclist to use for the number of templates.
 max_target_frac_obs (float) – This sets the number of templates to generate if the number of observations for the light curves is smaller than the number of objects in the collection. The number of templates will be set to this fraction of the number of observations if this is the case.
 min_template_number (int) – This is the minimum number of templates to generate.
 max_template_number (int) – This is the maximum number of templates to generate. If target_template_frac times the number of objects is greater than max_template_number, only max_template_number templates will be used.
 max_rms (float) – This is the maximum light curve RMS for an object to consider it as a possible template ensemble member.
 max_mult_above_magmad (float) – This is the maximum multiplier above the magRMS fit to consider an object as variable and thus not part of the template ensemble.
 max_mult_above_mageta (float) – This is the maximum multiplier above the mageta (variable index) fit to consider an object as variable and thus not part of the template ensemble.
 mag_bandpass (str) – This sets the key in the light curve dict’s objectinfo dict to use as the canonical magnitude for the object and apply any magnitude limits to.
 custom_bandpasses (dict or None) – This can be used to provide any custom band name keys to the star feature collection function.
 mag_bright_limit (float or list of floats) – This sets the brightest mag (in the mag_bandpass filter) for a potential member of the TFA template ensemble. If this is a single float, the value will be used for all magcols. If this is a list of floats with len = len(magcols), the specific bright limits will be used for each magcol individually.
 mag_faint_limit (float or list of floats) – This sets the faintest mag (in the mag_bandpass filter) for a potential member of the TFA template ensemble. If this is a single float, the value will be used for all magcols. If this is a list of floats with len = len(magcols), the specific faint limits will be used for each magcol individually.
 process_template_lcs (bool) – If True, will reform the template light curves to the chosen timebase. If False, will only select light curves for templates but not process them. This is useful for initial exploration of how the template LC are selected.
 template_sigclip (float or sequence of floats or None) – This sets the sigmaclip to be applied to the template light curves.
 template_interpolate (str) – This sets the kwarg to pass to scipy.interpolate.interp1d to set the kind of interpolation to use when reforming light curves to the TFA template timebase.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 timecols (list of str or None) – The timecol keys to use from the lcdict in calculating the features.
 magcols (list of str or None) – The magcol keys to use from the lcdict in calculating the features.
 errcols (list of str or None) – The errcol keys to use from the lcdict in calculating the features.
 nworkers (int) – The number of parallel workers to launch.
 maxworkertasks (int) – The maximum number of tasks to run per worker before it is replaced by a fresh one.
Returns: This function returns a dict that can be passed directly to apply_tfa_magseries below. It can optionally produce a pickle with the same dict, which can also be passed to that function.
Return type: dict

astrobase.lcproc.tfa.
apply_tfa_magseries
(lcfile, timecol, magcol, errcol, templateinfo, mintemplatedist_arcmin=10.0, lcformat='hatsql', lcformatdir=None, interp='nearest', sigclip=5.0)[source]¶ This applies the TFA correction to an LC given TFA template information.
Parameters:  lcfile (str) – This is the light curve file to apply the TFA correction to.
 timecol,magcol,errcol (str) – These are the column keys in the lcdict for the LC file to apply the TFA correction to.
 templateinfo (dict or str) – This is either the dict produced by tfa_templates_lclist or the pickle produced by the same function.
 mintemplatedist_arcmin (float) – This sets the minimum distance required from the target object for objects in the TFA template ensemble. Objects closer than this distance will be removed from the ensemble.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 interp (str) – This is passed to scipy.interpolate.interp1d as the kind of interpolation to use when reforming this light curve to the timebase of the TFA templates.
 sigclip (float or sequence of two floats or None) – This is the sigma clip to apply to this light curve before running TFA on it.
Returns: This returns the filename of the light curve file generated after TFA applications. This is a pickle (that can be read by lcproc.read_pklc) in the same directory as lcfile. The magcol will be encoded in the filename, so each magcol in lcfile gets its own output file.
Return type: str

astrobase.lcproc.tfa.
parallel_tfa_lclist
(lclist, templateinfo, timecols=None, magcols=None, errcols=None, lcformat='hatsql', lcformatdir=None, interp='nearest', sigclip=5.0, mintemplatedist_arcmin=10.0, nworkers=2, maxworkertasks=1000)[source]¶ This applies TFA in parallel to all LCs in the given list of file names.
Parameters:  lclist (str) – This is a list of light curve files to apply TFA correction to.
 templateinfo (dict or str) – This is either the dict produced by tfa_templates_lclist or the pickle produced by the same function.
 timecols (list of str or None) – The timecol keys to use from the lcdict in applying TFA corrections.
 magcols (list of str or None) – The magcol keys to use from the lcdict in applying TFA corrections.
 errcols (list of str or None) – The errcol keys to use from the lcdict in applying TFA corrections.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 interp (str) – This is passed to scipy.interpolate.interp1d as the kind of interpolation to use when reforming the light curves to the timebase of the TFA templates.
 sigclip (float or sequence of two floats or None) – This is the sigma clip to apply to the light curves before running TFA on it.
 mintemplatedist_arcmin (float) – This sets the minimum distance required from the target object for objects in the TFA template ensemble. Objects closer than this distance will be removed from the ensemble.
 nworkers (int) – The number of parallel workers to launch
 maxworkertasks (int) – The maximum number of tasks per worker allowed before it’s replaced by a fresh one.
Returns: Contains the input file names and output TFA light curve filenames per input file organized by each magcol in magcols.
Return type: dict

astrobase.lcproc.tfa.
parallel_tfa_lcdir
(lcdir, templateinfo, lcfileglob=None, timecols=None, magcols=None, errcols=None, lcformat='hatsql', lcformatdir=None, interp='nearest', sigclip=5.0, mintemplatedist_arcmin=10.0, nworkers=2, maxworkertasks=1000)[source]¶ This applies TFA in parallel to all LCs in a directory.
Parameters:  lcdir (str) – This is the directory containing the light curve files to process..
 templateinfo (dict or str) – This is either the dict produced by tfa_templates_lclist or the pickle produced by the same function.
 lcfileglob (str or None) – The UNIX file glob to use when searching for light curve files in lcdir. If None, the default file glob associated with registered LC format provided is used.
 timecols (list of str or None) – The timecol keys to use from the lcdict in applying TFA corrections.
 magcols (list of str or None) – The magcol keys to use from the lcdict in applying TFA corrections.
 errcols (list of str or None) – The errcol keys to use from the lcdict in applying TFA corrections.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 interp (str) – This is passed to scipy.interpolate.interp1d as the kind of interpolation to use when reforming the light curves to the timebase of the TFA templates.
 sigclip (float or sequence of two floats or None) – This is the sigma clip to apply to the light curves before running TFA on it.
 mintemplatedist_arcmin (float) – This sets the minimum distance required from the target object for objects in the TFA template ensemble. Objects closer than this distance will be removed from the ensemble.
 nworkers (int) – The number of parallel workers to launch
 maxworkertasks (int) – The maximum number of tasks per worker allowed before it’s replaced by a fresh one.
Returns: Contains the input file names and output TFA light curve filenames per input file organized by each magcol in magcols.
Return type: dict
astrobase.lcproc.varthreshold module¶
This contains functions to investigate where to set a threshold for several variability indices to distinguish between variable and nonvariable stars.

astrobase.lcproc.varthreshold.
variability_threshold
(featuresdir, outfile, magbins=array([ 8., 8.25, 8.5, 8.75, 9., 9.25, 9.5, 9.75, 10., 10.25, 10.5, 10.75, 11., 11.25, 11.5, 11.75, 12., 12.25, 12.5, 12.75, 13., 13.25, 13.5, 13.75, 14., 14.25, 14.5, 14.75, 15., 15.25, 15.5, 15.75, 16. ]), maxobjects=None, timecols=None, magcols=None, errcols=None, lcformat='hatsql', lcformatdir=None, min_lcmad_stdev=5.0, min_stetj_stdev=2.0, min_iqr_stdev=2.0, min_inveta_stdev=2.0, verbose=True)[source]¶ This generates a list of objects with stetson J, IQR, and 1.0/eta above some threshold value to select them as potential variable stars.
Use this to pare down the objects to review and put through periodfinding. This does the thresholding per magnitude bin; this should be better than one single cut through the entire magnitude range. Set the magnitude bins using the magbins kwarg.
FIXME: implement a voting classifier here. this will choose variables based on the thresholds in IQR, stetson, and inveta based on weighting carried over from the variability recovery sims.
Parameters:  featuresdir (str) – This is the directory containing variability feature pickles created by
astrobase.lcproc.lcpfeatures.parallel_varfeatures()
or similar.  outfile (str) – This is the output pickle file that will contain all the threshold information.
 magbins (np.array of floats) – This sets the magnitude bins to use for calculating thresholds.
 maxobjects (int or None) – This is the number of objects to process. If None, all objects with feature pickles in featuresdir will be processed.
 timecols (list of str or None) – The timecol keys to use from the lcdict in calculating the thresholds.
 magcols (list of str or None) – The magcol keys to use from the lcdict in calculating the thresholds.
 errcols (list of str or None) – The errcol keys to use from the lcdict in calculating the thresholds.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 min_lcmad_stdev,min_stetj_stdev,min_iqr_stdev,min_inveta_stdev (float or np.array) – These are all the standard deviation multiplier for the distributions of light curve standard deviation, Stetson J variability index, the light curve interquartile range, and 1/eta variability index respectively. These multipliers set the minimum values of these measures to use for selecting variable stars. If provided as floats, the same value will be used for all magbins. If provided as np.arrays of size = magbins.size  1, will be used to apply possibly different sigma cuts for each magbin.
 verbose (bool) – If True, will report progress and warn about any problems.
Returns: Contains all of the variability threshold information along with indices into the array of the object IDs chosen as variables.
Return type: dict
 featuresdir (str) – This is the directory containing variability feature pickles created by

astrobase.lcproc.varthreshold.
plot_variability_thresholds
(varthreshpkl, xmin_lcmad_stdev=5.0, xmin_stetj_stdev=2.0, xmin_iqr_stdev=2.0, xmin_inveta_stdev=2.0, lcformat='hatsql', lcformatdir=None, magcols=None)[source]¶ This makes plots for the variability threshold distributions.
Parameters:  varthreshpkl (str) – The pickle produced by the function above.
 xmin_lcmad_stdev,xmin_stetj_stdev,xmin_iqr_stdev,xmin_inveta_stdev (float or np.array) – Values of the threshold values to override the ones in the vartresholdpkl. If provided, will plot the thresholds accordingly instead of using the ones in the input pickle directly.
 lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
 lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
 magcols (list of str or None) – The magcol keys to use from the lcdict.
Returns: The file name of the threshold plot generated.
Return type: str
astrobase.lcfit
: functions for fitting various light curve models to observations, including sinusoidal, trapezoidal and full MandelAgol planet transits, eclipses, and splines.astrobase.lcmath
: functions for light curve operations such as phasing, normalization, binning (in time and phase), sigmaclipping, external parameter decorrelation (EPD), etc.astrobase.lcmodels
: modules that contain simple models for several variable star classes, including sinusoidal variables, eclipsing binaries, and transiting planets. Useful for fitting these with the functions in theastrobase.lcfit
module.astrobase.varbase
: functions for dealing with periodic signals including masking and prewhitening them, ACF calculations, light curve detrending, and specific tools for planetary transits.astrobase.plotbase
: functions to plot light curves, phased light curves, periodograms, and download Digitized Sky Survey cutouts from the NASA SkyView service.astrobase.lcproc
: driver functions for running an endtoend pipeline including: (i) object selection from a collection of light curves by position, crossmatching to external catalogs, or light curve objectinfo keys, (ii) running variability feature calculation and detection, (iii) running periodfinding, and (iv) object review using the checkplotserver webapp for variability classification. This also contains an Amazon AWSenabled lcproc implementation.
astrobase.checkplot package¶
Contains functions to make checkplots: quick views for determining periodic variability for light curves and sanitychecking results from periodfinding functions (e.g., from periodbase).
The astrobase.checkplot.pkl.checkplot_pickle()
function takes, for a
single object, an arbitrary number of results from independent periodfinding
functions (e.g. BLS, PDM, AoV, GLS, etc.) in periodbase, and generates a pickle
file that contains object and variability information, finder chart, mag series
plot, and for each periodfinding result: a periodogram and phased mag series
plots for an arbitrary number of ‘best periods’.
This is intended for use with an external checkplot viewer: the Tornado webapp
checkplotserver.py, but you can also use the
astrobase.checkplot.pkl_png.checkplot_pickle_to_png()
function to
render this to a PNG that will look something like:
[ finder ] [ objectinfo ] [ variableinfo ] [ unphased LC ]
[ periodogram1 ] [ phased LC P1 ] [ phased LC P2 ] [ phased LC P3 ]
[ periodogram2 ] [ phased LC P1 ] [ phased LC P2 ] [ phased LC P3 ]
.
.
[ periodogramN ] [ phased LC P1 ] [ phased LC P2 ] [ phased LC P3 ]
for N independent periodfinding methods producing:
 periodogram1,2,3…N: the periodograms from each method
 phased LC P1,P2,P3: the phased lightcurves using the best 3 peaks in each periodogram
The astrobase.checkplot.png.checkplot_png()
function takes a single
periodfinding result and makes the following 3 x 3 grid and writes to a PNG:
[LSP plot + objectinfo] [ unphased LC ] [ period 1 phased LC ]
[period 1 phased LC /2] [period 1 phased LC x2] [ period 2 phased LC ]
[ period 3 phased LC ] [period 4 phased LC ] [ period 5 phased LC ]
The astrobase.checkplot.png.twolsp_checkplot_png()
function makes a
similar plot for two independent periodfinding routines and writes to a PNG:
[ pgram1 + objectinfo ] [ pgram2 ] [ unphased LC ]
[ pgram1 P1 phased LC ] [ pgram1 P2 phased LC ] [ pgram1 P3 phased LC ]
[ pgram2 P1 phased LC ] [ pgram2 P2 phased LC ] [ pgram2 P3 phased LC ]
where:
 pgram1 is the plot for the periodogram in the lspinfo1 dict
 pgram1 P1, P2, and P3 are the best three periods from lspinfo1
 pgram2 is the plot for the periodogram in the lspinfo2 dict
 pgram2 P1, P2, and P3 are the best three periods from lspinfo2