gdoc_down package

Submodules

gdoc_down.core module

Save the content of a Google document to a local file.

Author:Jonathan Karr <karr@mssm.edu>
Date:2016-08-16
Copyright:2016, Karr Lab
License:MIT
class gdoc_down.core.GDocDown(credentials=None, service=None)[source]

Bases: object

Downloads Google documents to several formats

  • HTML (.html)
  • LaTeX (.tex)
  • Open Office document (.odt)
  • Plain text file (.txt)
  • Portable document format (.pdf)
  • Rich text document (.rtf)
  • Word document (.docx)

The class has several special features for handling LaTeX files:

  • The program ignores all images. This allows the user to place images inside the Google document for convenience and to use includegraphics to embed images in compile PDF files.
  • The program will convert all Google document comments to PDF comments.
  • The program ignores all page breaks.

The first time the program is called, the program will request access to the user’s Google account. This will create a client.json file.

credentials

oauth2client.client.OAuth2Credentials

Credentials object for OAuth 2.0.

service

apiclient.discovery.Resource

A Resource object with methods for interacting with the service

APPLICATION_NAME = 'gdoc_down'
CLIENT_SECRET_PATH = '/home/docs/checkouts/readthedocs.org/user_builds/gdoc-down/checkouts/0.0.5/gdoc_down/client.json'
CREDENTIAL_PATH = '/home/docs/.gdoc_down/auth.json'
SCOPES = ('https://www.googleapis.com/auth/drive', 'https://www.googleapis.com/auth/drive.file', 'https://www.googleapis.com/auth/drive.metadata.readonly', 'https://www.googleapis.com/auth/drive.readonly')
authenticate(credentials)[source]

Authenticate with Google server

Returns:A Resource object with methods for interacting with the service
Return type:apiclient.discovery.Resource
static convert_html_to_latex(html_content)[source]

Format Google document content downloaded in HTML format for LaTeX

  • Replace HTML characters with LaTeX commands
  • Remove images
  • Replace comments with PDF comments (using pdfcomment package)
Parameters:html_content (bytes) – HTML version of Google document
Returns:formatted LaTeX
Return type:bytes
download(gdoc_file, format='docx', out_path='.', extension=None)[source]
Parameters:
  • gdoc_file (str) – path to Google document
  • format (str, optional) – desired output format (docx, html, odt, pdf, rtf, tex, txt)
  • out_path (str, optional) – path to save document
  • extension (str, optional) – extension to document
Raises:

objException: if format unknown or if ouput file path and extension cannot both be specified

get_credentials()[source]

Get and save user credentials from Google. If credentials haven’t already been stored, or if the stored credentials are invalid, obtain the new credentials.

Retuns:
oauth2client.client.OAuth2Credentials: Credentials object for OAuth 2.0.
static get_element_text(element)[source]

Get all of the text underneath an XML element

Parameters:el (xml.etree.ElementTree.Element) – XML element
Returns:element’s text
Return type:str
static get_gdoc_id(gdoc_file)[source]

Get Google document id

Parameters:gdoc_file (str) – path to Google document
Returns:id of Google document
Return type:str

Module contents