astrobase.lcproc.catalogs module

This contains functions to generate light curve catalogs from collections of light curves.

astrobase.lcproc.catalogs.make_lclist(basedir, outfile, use_list_of_filenames=None, lcformat='hat-sql', lcformatdir=None, fileglob=None, recursive=True, columns=('objectid', 'objectinfo.ra', 'objectinfo.decl', 'objectinfo.ndet'), makecoordindex=('objectinfo.ra', 'objectinfo.decl'), field_fitsfile=None, field_wcsfrom=None, field_scale=<astropy.visualization.interval.ZScaleInterval object>, field_stretch=<astropy.visualization.stretch.LinearStretch object>, field_colormap=<matplotlib.colors.LinearSegmentedColormap object>, field_findersize=None, field_pltopts={'marker': 'o', 'markeredgecolor': 'red', 'markeredgewidth': 2.0, 'markerfacecolor': 'none', 'markersize': 10.0}, field_grid=False, field_gridcolor='k', field_zoomcontain=True, maxlcs=None, nworkers=2)[source]

This generates a light curve catalog for all light curves in a directory.

Given a base directory where all the files are, and a light curve format, this will find all light curves, pull out the keys in each lcdict requested in the columns kwarg for each object, and write them to the requested output pickle file. These keys should be pointers to scalar values (i.e. something like objectinfo.ra is OK, but something like ‘times’ won’t work because it’s a vector).

Generally, this works with light curve reading functions that produce lcdicts as detailed in the docstring for lcproc.register_lcformat. Once you’ve registered your light curve reader functions using the lcproc.register_lcformat function, pass in the formatkey associated with your light curve format, and this function will be able to read all light curves in that format as well as the object information stored in their objectinfo dict.

Parameters:
  • basedir (str or list of str) –

    If this is a str, points to a single directory to search for light curves. If this is a list of str, it must be a list of directories to search for light curves. All of these will be searched to find light curve files matching either your light curve format’s default fileglob (when you registered your LC format), or a specific fileglob that you can pass in using the fileglob kwargh here. If the recursive kwarg is set, the provided directories will be searched recursively.

    If use_list_of_filenames is not None, it will override this argument and the function will take those light curves as the list of files it must process instead of whatever is specified in basedir.

  • outfile (str) – This is the name of the output file to write. This will be a pickle file, so a good convention to use for this name is something like ‘my-lightcurve-catalog.pkl’.
  • use_list_of_filenames (list of str or None) – Use this kwarg to override whatever is provided in basedir and directly pass in a list of light curve files to process. This can speed up this function by a lot because no searches on disk will be performed to find light curve files matching basedir and fileglob.
  • lcformat (str) – This is the formatkey associated with your light curve format, which you previously passed in to the lcproc.register_lcformat function. This will be used to look up how to find and read the light curves specified in basedir or use_list_of_filenames.
  • lcformatdir (str or None) – If this is provided, gives the path to a directory when you’ve stored your lcformat description JSONs, other than the usual directories lcproc knows to search for them in. Use this along with lcformat to specify an LC format JSON file that’s not currently registered with lcproc.
  • fileglob (str or None) – If provided, is a string that is a valid UNIX filename glob. Used to override the default fileglob for this LC format when searching for light curve files in basedir.
  • recursive (bool) – If True, the directories specified in basedir will be searched recursively for all light curve files that match the default fileglob for this LC format or a specific one provided in fileglob.
  • columns (list of str) –

    This is a list of keys in the lcdict produced by your light curve reader function that contain object information, which will be extracted and put into the output light curve catalog. It’s highly recommended that your LC reader function produce a lcdict that contains at least the default keys shown here.

    The lcdict keys to extract are specified by using an address scheme:

    • First level dict keys can be specified directly: e.g., ‘objectid’ will extract lcdict[‘objectid’]
    • Keys at other levels can be specified by using a period to indicate the level:
      • e.g., ‘objectinfo.ra’ will extract lcdict[‘objectinfo’][‘ra’]
      • e.g., ‘objectinfo.varinfo.features.stetsonj’ will extract lcdict[‘objectinfo’][‘varinfo’][‘features’][‘stetsonj’]
  • makecoordindex (list of two str or None) – This is used to specify which lcdict keys contain the right ascension and declination coordinates for this object. If these are provided, the output light curve catalog will have a kdtree built on all object coordinates, which enables fast spatial searches and cross-matching to external catalogs by checkplot and lcproc functions.
  • field_fitsfile (str or None) – If this is not None, it should be the path to a FITS image containing the objects these light curves are for. If this is provided, make_lclist will use the WCS information in the FITS itself if field_wcsfrom is None (or from a WCS header file pointed to by field_wcsfrom) to obtain x and y pixel coordinates for all of the objects in the field. A finder chart will also be made using astrobase.plotbase.fits_finder_chart using the corresponding field_scale, _stretch, _colormap, _findersize, _pltopts, _grid, and _gridcolors kwargs for that function, reproduced here to enable customization of the finder chart plot.
  • field_wcsfrom (str or None) – If wcsfrom is None, the WCS to transform the RA/Dec to pixel x/y will be taken from the FITS header of fitsfile. If this is not None, it must be a FITS or similar file that contains a WCS header in its first extension.
  • field_scale (astropy.visualization.Interval object) – scale sets the normalization for the FITS pixel values. This is an astropy.visualization Interval object. See http://docs.astropy.org/en/stable/visualization/normalization.html for details on scale and stretch objects.
  • field_stretch (astropy.visualization.Stretch object) – stretch sets the stretch function for mapping FITS pixel values to output pixel values. This is an astropy.visualization Stretch object. See http://docs.astropy.org/en/stable/visualization/normalization.html for details on scale and stretch objects.
  • field_colormap (matplotlib Colormap object) – colormap is a matplotlib color map object to use for the output image.
  • field_findersize (None or tuple of two ints) – If findersize is None, the output image size will be set by the NAXIS1 and NAXIS2 keywords in the input fitsfile FITS header. Otherwise, findersize must be a tuple with the intended x and y size of the image in inches (all output images will use a DPI = 100).
  • field_pltopts (dict) – field_pltopts controls how the overlay points will be plotted. This a dict with standard matplotlib marker, etc. kwargs as key-val pairs, e.g. ‘markersize’, ‘markerfacecolor’, etc. The default options make red outline circles at the location of each object in the overlay.
  • field_grid (bool) – grid sets if a grid will be made on the output image.
  • field_gridcolor (str) – gridcolor sets the color of the grid lines. This is a usual matplotib color spec string.
  • field_zoomcontain (bool) – field_zoomcontain controls if the finder chart will be zoomed to just contain the overlayed points. Everything outside the footprint of these points will be discarded.
  • maxlcs (int or None) – This sets how many light curves to process in the input LC list generated by searching for LCs in basedir or in the list provided as use_list_of_filenames.
  • nworkers (int) – This sets the number of parallel workers to launch to collect information from the light curves.
Returns:

Returns the path to the generated light curve catalog pickle file.

Return type:

str

astrobase.lcproc.catalogs.filter_lclist(lc_catalog, objectidcol='objectid', racol='ra', declcol='decl', xmatchexternal=None, xmatchdistarcsec=3.0, externalcolnums=(0, 1, 2), externalcolnames=('objectid', 'ra', 'decl'), externalcoldtypes='U20, f8, f8', externalcolsep=None, externalcommentchar='#', conesearch=None, conesearchworkers=1, columnfilters=None, field_fitsfile=None, field_wcsfrom=None, field_scale=<astropy.visualization.interval.ZScaleInterval object>, field_stretch=<astropy.visualization.stretch.LinearStretch object>, field_colormap=<matplotlib.colors.LinearSegmentedColormap object>, field_findersize=None, field_pltopts={'marker': 'o', 'markeredgecolor': 'red', 'markeredgewidth': 2.0, 'markerfacecolor': 'none', 'markersize': 10.0}, field_grid=False, field_gridcolor='k', field_zoomcontain=True, copylcsto=None)[source]

This is used to perform cone-search, cross-match, and column-filter operations on a light curve catalog generated by make_lclist.

Uses the output of make_lclist above. This function returns a list of light curves matching various criteria specified by the xmatchexternal, conesearch, and columnfilters kwargs. Use this function to generate input lists for other lcproc functions, e.g. lcproc.lcvfeatures.parallel_varfeatures, lcproc.periodfinding.parallel_pf, and lcproc.lcbin.parallel_timebin, among others.

The operations are applied in this order if more than one is specified: xmatchexternal -> conesearch -> columnfilters. All results from these operations are joined using a logical AND operation.

Parameters:
  • objectidcol (str) – This is the name of the object ID column in the light curve catalog.
  • racol (str) – This is the name of the RA column in the light curve catalog.
  • declcol (str) – This is the name of the Dec column in the light curve catalog.
  • xmatchexternal (str or None) – If provided, this is the filename of a text file containing objectids, ras and decs to match the objects in the light curve catalog to by their positions.
  • xmatchdistarcsec (float) – This is the distance in arcseconds to use when cross-matching to the external catalog in xmatchexternal.
  • externalcolnums (sequence of int) – This a list of the zero-indexed column numbers of columns to extract from the external catalog file.
  • externalcolnames (sequence of str) – This is a list of names of columns that will be extracted from the external catalog file. This is the same length as externalcolnums. These must contain the names provided as the objectid, ra, and decl column names so this function knows which column numbers correspond to those columns and can use them to set up the cross-match.
  • externalcoldtypes (str) – This is a CSV string containing numpy dtype definitions for all columns listed to extract from the external catalog file. The number of dtype definitions should be equal to the number of columns to extract.
  • externalcolsep (str or None) – The column separator to use when extracting columns from the external catalog file. If None, any whitespace between columns is used as the separator.
  • externalcommentchar (str) – The character indicating that a line in the external catalog file is to be ignored.
  • conesearch (list of float) –

    This is used to specify cone-search parameters. It should be a three element list:

    [center_ra_deg, center_decl_deg, search_radius_deg]

  • conesearchworkers (int) – The number of parallel workers to launch for the cone-search operation.
  • columnfilters (list of str) –

    This is a list of strings indicating any filters to apply on each column in the light curve catalog. All column filters are applied in the specified sequence and are combined with a logical AND operator. The format of each filter string should be:

    ’<lc_catalog column>|<operator>|<operand>’

    where:

    • <lc_catalog column> is a column in the lc_catalog pickle file
    • <operator> is one of: ‘lt’, ‘gt’, ‘le’, ‘ge’, ‘eq’, ‘ne’, which correspond to the usual operators: <, >, <=, >=, ==, != respectively.
    • <operand> is a float, int, or string.
  • field_fitsfile (str or None) – If this is not None, it should be the path to a FITS image containing the objects these light curves are for. If this is provided, make_lclist will use the WCS information in the FITS itself if field_wcsfrom is None (or from a WCS header file pointed to by field_wcsfrom) to obtain x and y pixel coordinates for all of the objects in the field. A finder chart will also be made using astrobase.plotbase.fits_finder_chart using the corresponding field_scale, _stretch, _colormap, _findersize, _pltopts, _grid, and _gridcolors kwargs for that function, reproduced here to enable customization of the finder chart plot.
  • field_wcsfrom (str or None) – If wcsfrom is None, the WCS to transform the RA/Dec to pixel x/y will be taken from the FITS header of fitsfile. If this is not None, it must be a FITS or similar file that contains a WCS header in its first extension.
  • field_scale (astropy.visualization.Interval object) – scale sets the normalization for the FITS pixel values. This is an astropy.visualization Interval object. See http://docs.astropy.org/en/stable/visualization/normalization.html for details on scale and stretch objects.
  • field_stretch (astropy.visualization.Stretch object) – stretch sets the stretch function for mapping FITS pixel values to output pixel values. This is an astropy.visualization Stretch object. See http://docs.astropy.org/en/stable/visualization/normalization.html for details on scale and stretch objects.
  • field_colormap (matplotlib Colormap object) – colormap is a matplotlib color map object to use for the output image.
  • field_findersize (None or tuple of two ints) – If findersize is None, the output image size will be set by the NAXIS1 and NAXIS2 keywords in the input fitsfile FITS header. Otherwise, findersize must be a tuple with the intended x and y size of the image in inches (all output images will use a DPI = 100).
  • field_pltopts (dict) – field_pltopts controls how the overlay points will be plotted. This a dict with standard matplotlib marker, etc. kwargs as key-val pairs, e.g. ‘markersize’, ‘markerfacecolor’, etc. The default options make red outline circles at the location of each object in the overlay.
  • field_grid (bool) – grid sets if a grid will be made on the output image.
  • field_gridcolor (str) – gridcolor sets the color of the grid lines. This is a usual matplotib color spec string.
  • field_zoomcontain (bool) – field_zoomcontain controls if the finder chart will be zoomed to just contain the overlayed points. Everything outside the footprint of these points will be discarded.
  • copylcsto (str) – If this is provided, it is interpreted as a directory target to copy all the light curves that match the specified conditions.
Returns:

Returns a two elem tuple: (matching_object_lcfiles, matching_objectids) if conesearch and/or column filters are used. If xmatchexternal is also used, a three-elem tuple is returned: (matching_object_lcfiles, matching_objectids, extcat_matched_objectids).

Return type:

tuple

astrobase.lcproc.catalogs.add_cpinfo_to_lclist(checkplots, initial_lc_catalog, magcol, outfile, checkplotglob='checkplot*.pkl*', infokeys=[('comments', <class 'numpy.str_'>, False, True, '', ''), ('objectinfo.objecttags', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.twomassid', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.bmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.vmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.rmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.imag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.jmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.hmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.kmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.sdssu', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.sdssg', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.sdssr', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.sdssi', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.sdssz', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_bmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_vmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_rmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_imag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_jmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_hmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_kmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_sdssu', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_sdssg', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_sdssr', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_sdssi', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.dered_sdssz', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_bmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_vmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_rmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_imag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_jmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_hmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_kmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_sdssu', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_sdssg', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_sdssr', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_sdssi', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.extinction_sdssz', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.color_classes', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.pmra', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.pmdecl', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.propermotion', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.rpmj', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.gl', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.gb', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.gaia_status', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.gaia_ids.0', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.gaiamag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.gaia_parallax', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.gaia_parallax_err', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.gaia_absmag', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.simbad_best_mainid', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.simbad_best_objtype', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.simbad_best_allids', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.simbad_best_distarcsec', <class 'numpy.float64'>, True, True, nan, nan), ('objectinfo.ticid', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.tic_version', <class 'numpy.str_'>, True, True, '', ''), ('objectinfo.tessmag', <class 'numpy.float64'>, True, True, nan, nan), ('varinfo.vartags', <class 'numpy.str_'>, False, True, '', ''), ('varinfo.varperiod', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.varepoch', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.varisperiodic', <class 'numpy.int64'>, False, True, 0, 0), ('varinfo.objectisvar', <class 'numpy.int64'>, False, True, 0, 0), ('varinfo.features.median', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.mad', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.stdev', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.mag_iqr', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.skew', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.kurtosis', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.stetsonj', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.stetsonk', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.eta_normal', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.linear_fit_slope', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.magnitude_ratio', <class 'numpy.float64'>, False, True, nan, nan), ('varinfo.features.beyond1std', <class 'numpy.float64'>, False, True, nan, nan)], nworkers=2)[source]

This adds checkplot info to the initial light curve catalogs generated by make_lclist.

This is used to incorporate all the extra info checkplots can have for objects back into columns in the light curve catalog produced by make_lclist. Objects are matched between the checkplots and the light curve catalog using their objectid. This then allows one to search this ‘augmented’ light curve catalog by these extra columns. The ‘augmented’ light curve catalog also forms the basis for search interface provided by the LCC-Server.

The default list of keys that will be extracted from a checkplot and added as columns in the initial light curve catalog is listed above in the CPINFO_DEFAULTKEYS list.

Parameters:
  • checkplots (str or list) – If this is a str, is interpreted as a directory which will be searched for checkplot pickle files using checkplotglob. If this is a list, it will be interpreted as a list of checkplot pickle files to process.
  • initial_lc_catalog (str) – This is the path to the light curve catalog pickle made by make_lclist.
  • magcol (str) – This is used to indicate the light curve magnitude column to extract magnitude column specific information. For example, Stetson variability indices can be generated using magnitude measurements in separate photometric apertures, which appear in separate magcols in the checkplot. To associate each such feature of the object with its specific magcol, pass that magcol in here. This magcol will then be added as a prefix to the resulting column in the ‘augmented’ LC catalog, e.g. Stetson J will appear as magcol1_stetsonj and magcol2_stetsonj for two separate magcols.
  • outfile (str) – This is the file name of the output ‘augmented’ light curve catalog pickle file that will be written.
  • infokeys (list of tuples) –

    This is a list of keys to extract from the checkplot and some info on how this extraction is to be done. Each key entry is a six-element tuple of the following form:

    • key name in the checkplot
    • numpy dtype of the value of this key
    • False if key is associated with a magcol or True otherwise
    • False if subsequent updates to the same column name will append to existing key values in the output augmented light curve catalog or True if these will overwrite the existing key value
    • character to use to substitute a None value of the key in the checkplot in the output light curve catalog column
    • character to use to substitute a nan value of the key in the checkplot in the output light curve catalog column

    See the CPFINFO_DEFAULTKEYS list above for examples.

  • nworkers (int) – The number of parallel workers to launch to extract checkplot information.
Returns:

Returns the path to the generated ‘augmented’ light curve catalog pickle file.

Return type:

str