Working with Files in Python

Related Content

This article teaches you how to use Python to perform read and write operations to text, CSV, JSON, and binary files. In addition to that, we’ll cover how to use Python io, StringIO, json, csv, and pickle modules to open, read, write, list files, and many more.

You’ll deal with data processing tasks as a cloud automation engineer daily. The data could come from various sources like files, databases, object storage (S3, for example), and many other sources. The most common formats from our point of view are .txt, .json, .csv, .parquet, .pickle, and various image formats. Before dealing with any data, you must know how to open, read, and write into files.

Flat vs. non-flat files

There are two types of plain text files available:

  • Flat files – are the files where:
    • The file is simple
    • All records are stored in the same format
    • There’s information for indexing or relationships between stored records
    • Examples are: a plain text file or a binary file (CSV, JSON, images, etc.)
  • Non-flat files – are the files which:
    • Usually contains metadata describing the file
    • Contain mixed data
    • File processing rules might depend on the type of the file content or descriptive records
    • Processing such files can be is extremely complex task
    • An example of such file type is an XML document

Python file objects

Python has in-built functions to create, read, and write files. The io module is the default module managing files, and you can use it without importing it. Before reading or writing the file, you need to open it. This is possible by using open() function that returns a file object called a handler. Then you’re using this handler to read the information or write it to a file. After finishing all file operations, you should close the file using close() the handler method.

An IOError exception is raised if any of the file operations fail.

Opening a file in Python

Before reading from or writing a file, we need to open it. Python has a built-in function to open a file open(filename, access_mode) that returns a file object or handler. If not specified, the default access mode is a read-only text file mode.

Here’s the simplest example of opening a text file in read-only mode:

file_obj = open('hands-on-cloud.txt')

The statement above will try to open a file in the current directory of your Python module. You can specify a full file path to open the file from another location on your file system,

If the file does not exist or you specified the wrong location, you’ll get FileNotFoundError exception:

Traceback (most recent call last):
  File "files.py", line 1, in <module>
    with open('/wrong/path/hands-on-cloud.txt') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/wrong/path/hands-on-cloud.txt'

Python file open modes

Access modes determine the type of access(write, read, or read-write) and type of file (text or binary).

The common access_modes are:

  • r – Default mode opens a text file for reading.
  • w – Opens a text file for writing.
    • If the file does not exist, then it creates a new one.
    • If the file already exists, then it truncates it.
  • a – Opens a text file to append text to the end of the file.
    • If the file does not exist, then it creates the new one.
    • If the file exists, then it does not truncates it.
  • + – Allows opening the file for reading and writing.
  • b – Opens the file in binary mode.
  • rb – Opens the file in binary mode for reading.
  • wb – Open the file in binary mode for writing.
  • ab – Same as a in text mode, but for binary mode.

Here are some examples:

# open file "file.txt" to read
file_obj = open("file.txt", 'r')

# open file "file.txt" to write
file_obj = open("file.txt", 'w')

# open file "file.txt" read and write
file_obj = open("file.txt", 'r+')

# open file "test.bmp" in binary mode to read
file_obj = open("test.bmp", 'rb')

Closing a File in Python

After performing the actions with the file, we have to close it. Python uses the close() function of the handler to close the file. Closing files after opening them releases the Python interpreter buffer space. That might be important when you need to process multiple files. There are several ways to close files.

Explicit file close operation in Python

In this method, you’re closing the file by explicitly calling close() method from the file handler.

# open file 'file.txt' to read
file_obj = open('file.txt')

# performing file operations

# explicit file closing
file_obj.close()

The code example above does not guarantee that Python will close the file because exceptions may arise during file operations.

Another approach is to use the try-catch-finally block, which guarantees that Python will close the file even if an exception happens.

file_obj = None
try:
    # open file to read
    file_obj = open('file.txt')

    # perform file operations
finally:
    # close the file
    file_obj.close()

Implicit file close operation in Python

We can simplify the code block above by using the with statement that wraps the executed block with methods defined by a ContextManager. This allows you to encapsulate and reuse try-catch-finally pattern.

with open('file.txt') as f:
    print(f.read())

Additional information on the topic is available in Python Coding Tip: Using the ‘With’ Statement Instead ‘Try…Finally’ article.

Reading files in Python

After opening the file in Python, we can read data from the file. There are several functions in Python that read the file content in different ways:

  • read() – reads the entire file content
  • readline() – read a single line from the file
  • readlines() – read all lines from the file into an array

Let’s review them one by one.

Reading text file to a string in Python

To read a file’s contents, call read(size) method from the file handler object, which reads some quantity of data and returns it as a string in text mode or bytes objects in binary access mode. size is an optional numeric argument, which determines how much data to read (reads the entire file if omitted or negative).

Here is the example:

with open('file.txt') as f:
    file_content = f.read()
    print(file_content)

In the example above, we’re trying to read an existing text file and read it entirely to a data variable and then print it.

Reading a single line from a text file in Python

To read a single line from the file, you need to use the readline() method from the file handler. Pay attention that this method determines the end of the line by the special \n (newline) character. The returned result will contain a \n character.

with open('file.txt') as f:
    line = f.readline()
    print(line)

The example output:

Line 1
      <==== Additional new line is printed here

If you’d like to remove \n character from the line, you need to call the strip() function:

with open('file.txt') as f:
    line = f.readline()
    print(line.strip())

Reading text file to an array (list of strings) in Python

The readlines() file handler method allows you to read the file entirely (real all lines at once) and return all lines as a list object.

Here is the example:

with open('file.txt') as f:
    lines = f.readlines()
    for line in lines:
        print(line)

In the example above, we’re reading the entire file to a list object.

Note: Python will load the whole file content into memory.

Note: every element of the lines array will contain \n character.

Reading text files line by line in Python

A more memory-efficient way of reading text files is doing it line by line. You can do it by applying a for loop to the file handler object:

with open('file.txt') as f:
    for line in f:
        print(line)

Note: the content of the line variable will contain \n character.

Reading binary files in Python

Now, we will see how we can read binary files in Python.

with open('file.pdf', 'rb') as f:
    line = f.read()
    print(line)

Here’s what the code execution output going to look like:

b'%PDF-1.4\n%\xb5\xed\xae\xfb\n4 0 obj\n<< /Length 5 0 R\n   /Filter /FlateDecode\n>>\nstream\nx\x9cER[\x8e\x1cA\x08\xfb\xafSp\x81\xb0@=\xa8:F\x8e\x10\xb5\x94\xcd\xc7\xcc\xc7&\xf7\x97b\xd3\xb3\xbb\x9aQ\x81\xe9.cL\xbb\x18~?\x1c\xc74\xb9\x9e\xed\xa3Y\x95\xfe\xbe\xcb\xdb/\x93\xf7\x7f-L\xb6<\xc5\xbb\xce~\x90n\xed+\t\xdd\x00\xd5\xce\x10?:\xe5\x92\xae\xc3\xb7\x84if\x82"\x86\x9e\xb3\x90t0\x0bx\xfbR\x1b)(\xcf.\xc3\xc5\x9d\xe7\xc5\x03\xff\x87\x8c\xa9gL\xe6\xb3^&`\x16x\x0716Z\x86\xb4\x91:\x10A\x9f\x07<]\x8f-\x89\xae+(\x02x\x9d)\x9e\x9a+\xa4o=h\xd4\xa5w\x9e\xd0\xe8z\xd6]1\xc8\x8b\xc6IbnB\xeb~\x8fIQ\x91\xba7.\x99\xc6\xaaf\xe1\x9c\xbe\xdc\xb8\xe4\x0fmq\x83/h\x9c\x91\xccc\x91P<\xca\x13\xa8M\xdc\xf0Y\x08|G}.\xe2\xed\x81a\x81O\xdf\xd2P`\xc3G\xe9A\x01\xd8\xd7.\x9dP3Kx\x99\x9be\x1b\xf2\x114\xbc\x9c\xa40\xce\x8cX}\x1a]\x0c\xcd\xed,\xc4~\xb9\x9a\xb8\x05|\x12v\xedR\x96L\xcaXFh\x9a\xe0\x1d\x15\xb0\xe4\xe1\x8d\xe1\xe2\x86\x10\x1fr/\x0e)\x1c\xbd\xd7\x17\xf7V\xab\xf1\xca\x10\xb4\x9fx\xec\xb5r\x8c\xba7\x01B#\x8d\x11X\x8d\xc9\xdc\xee\xe8CG\x1f_\xa1\x1e\xb8\xdd\xee\x0e\x82\'\xc7\x83\x13s\xd11\x0cMG\xb1\x98{0\xdf\x95\xbc\x8c~\xd4\x8e\x80!\xf0\x85\xa74\x8c6\xf0u\x10T\xf5\xeb!X\x8b\xec\xf32\xdc/\xb2O\xf2X\xdf\x8d\x91\xb7\xd2\xfc\xad\xebw\xfb\xd9\xfe\x03@\x1c\x9e\xdf\nendstream\nendobj\n5 0 obj\n   402\nendobj\n3 0 obj\n<<\n   /ExtGState <<\n      /a0 << /CA 1 /ca 1 >>\n   >>\n>>\nendobj\n2 0 obj\n<< /Type /Page % 1\n   /Parent 1 0 R\n   /MediaBox [ 0 0 50 50 ]\n   /Contents 4 0 R\n   /Group <<\n      /Type /Group\n      /S /Transparency\n      /I true\n      /CS /DeviceRGB\n   >>\n   /Resources 3 0 R\n>>\nendobj\n1 0 obj\n<< /Type /Pages\n   /Kids [ 2 0 R ]\n   /Count 1\n>>\nendobj\n6 0 obj\n<< /Producer (cairo 1.16.0 (https://cairographics.org))\n   /CreationDate (D:20210720174858Z)\n>>\nendobj\n7 0 obj\n<< /Type /Catalog\n   /Pages 1 0 R\n>>\nendobj\nxref\n0 8\n0000000000 65535 f \n0000000804 00000 n \n0000000588 00000 n \n0000000516 00000 n \n0000000015 00000 n \n0000000494 00000 n \n0000000869 00000 n \n0000000980 00000 n \ntrailer\n<< /Size 8\n   /Root 7 0 R\n   /Info 6 0 R\n>>\nstartxref\n1032\n%%EOF\n'

Writing files in Python

You can use several methods to write data to a file in Python:

  • write(string) – writes a text string to a text file
  • write(binary_string) – saves a binary string to a text file
  • writelines(strings_list) – writes a text string to the file from the strings list array

Let’s use those methods to write information to the file.

Writing a single string line to a file in Python

This method will teach us how to write a single line to a file. It will be done by first opening the file and then using the write() method. Here is an example:

with open('file.txt', 'w') as f:
    f.write('Hands-On.Cloud!\n')
    f.write('Hands-On.Cloud!\n')

The code above writes two strings Hands-On.Cloud! to the file.txt file.

Note: you have to add a \n character to the end of each string to write it as a new line.

Writing strings to a file from an array in Python

This method allows us to write multiple lines from a single string array into a file:

with open('file.txt', 'w') as f:
    lines = [
        'I am writing first line. This is still first line.\n',
        'Now, I am on second line.\n',
        'This is 3rd\n'
    ]

    f.writelines(lines)

Note: you have to add a \n character to the end of each string in the array to write its elements at a new line.

Here’s the result content in the file.txt file:

I am writing first line. This is still first line.
Now, I am on second line.
This is 3rd

Appending strings to a file in Python

By default, the w access mode is overriding the file if it already exists. To append text to the file, you have to use the a access mode:

with open('file.txt', 'a') as f:
    lines = [
        'Appended line 1\n',
        'Appended line 2\n'
    ]

    f.writelines(lines)

Note: you have to add a \n character to the end of each string in the array to write its elements at a new line.

As soon as the file.txt file already exists, execution of the code above will append two additional lines to it:

I am writing first line. This is still first line.
Now, I am on second line.
This is 3rd
Appended line 1
Appended line 2

Moving file cursor position in Python

Two methods exist in the file object that allows you to manage cursor position during file operations in Python: seek() and tell():

  • seek() method sets the position of a file pointer
  • tell() method returns the current position of a file pointer

Let’s say you have the file.txt file with those two lines in it:

The seek() and tell() example
From hands-on.cloud

The seek() method in Python

The seek() method has the following syntax:

seek(offset, whence)

The parameters description is the following:

  • offset – changes cursor position to a positive or negative number of bytes
  • whence – tells Python which cursor location to use as a starting point
    • 0 – beginning of the file (the default); offset should be zero or positive
    • 1 – current cursor location; offset may be positive or negative
    • 2 – end of the file; offset should be negative

Note: pay attention that the seek() method supports offset from the current stream (whence=1) and the end (whence=2) of the stream only in binary mode. For the file opened in the text access mode, you’ll get the following error message: io.UnsupportedOperation: can't do nonzero cur-relative seeks.

Here’s a couple of the seek() usage examples:

The seek() exampleExplanation
f.seek(0)Move the file cursor to the beginning of a file
f.seek(0, 2)Move the file cursor to the end of a file
f.seek(10)Move file pointer ten characters ahead from the beginning of a file
f.seek(10, 1)Move the file pointer ten characters ahead of the current position.
f.seek(-3, 1)Move the file pointer three characters behind from the current position.
f.seek(-10, 2)Move the file cursor ten characters before the end of the file
Seek usage examples in Python

Let’s illustrate the results if seek() execution:

with open('file.txt', 'rb') as f:

    # Move file cursor to the beginning of a line
    print(f'Current cursor position: {f.tell()}')
    f.seek(0)
    print(f'seek(0) & readline() output: {f.readline()}\n')

    # Move file cursor to the end of a file
    print(f'Current cursor position: {f.tell()}')
    f.seek(0, 2)
    print(f'seek(0, 2) & readline() output: {f.readline()}\n')

    # Move file pointer ten characters ahead
    # from the beginning of a file.
    print(f'Current cursor position: {f.tell()}')
    f.seek(10)
    print(f'seek(10) & readline() output: {f.readline()}\n')

    # Move file pointer ten characters ahead
    # from the current position.
    print(f'Current cursor position: {f.tell()}')
    f.seek(10, 1)
    print(f'seek(10, 1) & readline() output: {f.readline()}\n')

    # Move file pointer three characters behind
    # from the current position.
    print(f'Current cursor position: {f.tell()}')
    f.seek(-3, 1)
    print(f'seek(-3, 1) & readline() output: {f.readline()}\n')

    # Move file cursor ten characters before
    # the end of the file
    print(f'Current cursor position: {f.tell()}')
    f.seek(-10, 2)
    print(f'seek(-10, 2) & readline() output: {f.readline()}\n')

The expected output will be:

Current cursor position: 0
seek(0) & readline() output: b'The seek() and tell() example\n'

Current cursor position: 30
# This is the end of the first line
# That's why the output of readline() is empty
seek(0, 2) & readline() output: b''

Current cursor position: 49
seek(10) & readline() output: b' and tell() example\n'

Current cursor position: 30
seek(10, 1) & readline() output: b'-on.cloud'

Current cursor position: 49
seek(-3, 1) & readline() output: b'oud'

Current cursor position: 49
seek(-10, 2) & readline() output: b's-on.cloud'

The tell() method in Python

As you’ve probably mentioned from the previous example, the tell() method returns the current cursor position in Python:

with open('file.txt') as f:
    # The cursor is at the beginning of a file
    pos = f.tell()
    print(f'Current cursor position: {pos}')

    # Move file pointer ten characters ahead
    # from the beginning of a file.
    f.seek(10)

    pos = f.tell()
    print(f'Current cursor position: {pos}')

Example of the script output:

Current cursor position: 0
Current cursor position: 10

FAQ

How to create a file in Python

Python file handler object open() method with access mode w or x allows creating an empty file:

# create or recreate an empty file
with open('file.txt', 'w') as f:
    pass

# create an empty file
# if the file exists, this call will
# generate FileExistsError exception
with open('file.txt', 'x') as f:
    pass

How to read and write files in memory in Python

One commonly asked question is how to read or write files in memory in Python. To do that, you have to use the StringIO Python module, which supports the same methods as the file handler object:

import io

with io.StringIO() as f:
    f.write('First line.\n')
    f.write('Second line.\n')

    f.seek(0)
    line = f.readline()
    print(f'Line: {line}')

    f.seek(0)
    lines = f.readlines()
    print(f'All lines: {lines}')

    f.seek(0)
    contents = f.read()
    print(f'\nFull file content:\n{contents}')

The expected output will look like this:

Line: First line.

All lines: ['First line.\n', 'Second line.\n']

Full file content:
First line.
Second line.

How to print to file in Python

In Python, It is possible to use the print() method to print the data to the file. To do this, you have to use its file argument:

import io

with io.StringIO() as f:
    print('First line.\n', file=f)
    print('Second line.\n', file=f)

    f.seek(0)
    line = f.read()
    print(f'Line: {line}')

    f.seek(0)
    lines = f.readlines()
    print(f'All lines: {lines}')

    # Retrieve file contents -- this will be
    # 'First line.\nSecond line.\n'
    f.seek(0)
    contents = f.read()
    print(f'\nFull file content:\n{contents}')

How to read and write JSON files in Python

Python provides a useful json module that allows you to work with JSON data. You can use this module to read and write JSON data to and from files:

import json

data = {
    'products': [
        {'id': 1, 'product_name': 'Pen'},
        {'id': 2, 'product_name': 'Table'},
        {'id': 3, 'product_name': 'Laptop'},
    ]
}

# writing JSON data to file
with open('file.json', 'w') as f:
    json.dump(data, f)

# reading JSON data from file
with open('file.json', 'r') as f:
    read_data = json.load(f)

print(f'Read data: {read_data}')

We’ve saved the Python dictionary as JSON data to the file and read it in the example above.

How to read and write CSV files in Python

Python provides a useful csv module that allows you to process CSV data. You can use this module to read and write CSV data to and from files:

import csv

data = [
    ['id', 'product_name'],
    [1, 'Pen'],
    [2, 'Table'],
    [3, 'Laptop']
]

# writing JSON data to file
with open('file.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerows(data)

read_data = []
# reading JSON data from file
with open('file.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        read_data.append(row)

print(f'Read data: {read_data}')

How to store and retrieve any type of data in Python

To store and retrieve any data in Python between your program executions, you can use the pickle module:

import pickle


class Person():
    def __init__(self, age, name):
        self.age = age
        self.name = name

    def __str__(self):
        return f'class Person({self.age}, {self.name})'


person_1 = Person(35, 'Joe')
person_2 = Person(42, 'Jane')

data = [
    person_1, person_2
]

# persisting data to disk
with open('file.pickle', 'wb') as f:
    pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)


# reading data from disk
read_data = []
with open('file.pickle', 'rb') as f:
    read_data = pickle.load(f)

for p in read_data:
    print(f'{p}')

In the example above, we did the following:

  • Defined a custom object (Person) as Python class
  • Instantiated class objects (person_1 and person_2)
  • Added class objects to a Python list
  • Persisted (saved) a list to .pickle file
  • Loaded list of class objects from the .pickle file

You can use this approach to load and save any data in Python.

Summary

This article covered Python to perform read and write operations to text, CSV, JSON, and binary files, including persisting custom object data.

LIKE THIS ARTICLE?
Facebook
Twitter
LinkedIn
Pinterest
WANT TO BE AN AUTHOR OF ANOTHER POST?

We’re looking for skilled technical authors for our blog!

Table of Contents