Working with Files in Python

Usama Hassan

Usama Hassan

5
(1)

From this article, you’ll learn how to use Python to perform read and write operations to text, CSV, JSON, and binary files. In addition to that, we’ll cover how to use Python io, StringIO, json, csv, and pickle modules to open, read, write, list files, and many more.

As a cloud automation engineer, you’ll be dealing with data processing tasks daily. The data could come from various sources like files, databases, object storage (S3, for example), and many other sources. The most common formats from our point of view are .txt.json, .csv.parquet, .pickle, and various images formats. Before starting dialing with any data, you have to know how to open, read, and write into files.

Flat vs. non-flat files

There are two types of plain text files available:

  • Flat files – are the files, where:
    • The file is simple
    • All records are stored in the same format
    • There’s information for indexing or relationships between stored records
    • Examples are: a plain text file, or a binary file (CSV, JSON, images, etc.)
  • Non-flat files – are the files, which:
    • Usually contains metadata describing the file
    • Contain mixed data
    • File processing rules might depend on the type of the file content or descriptive records
    • Processing such files can be is extrimely complex task
    • Example of such file type is a XML document

Python file objects

Python has in-built functions to create, read, and write files. The io module is the default module managing files, and you can use it without importing it. Before reading or writing the file, you need to open it. This is possible by using open() function that returns a file object called a handler. Then you’re using this handler to read the information or write it to a file. After finishing all file operations, you should close the file using close() method of the handler.

An IOError exception is raised if any of the file operations fail.

Opening a file in Python

Before reading from or writing to a file, we need to open it. Python has a built-in function to open a file open(filename, access_mode) which returns a file object or handler. If not specified, the default access mode is a read-only text file mode.

Here’s the simplest example of opening a text file in read-only mode:

file_obj = open('hands-on-cloud.txt')

The statement above will try to open a file in the current directory of your Python module. You can specify a full file path to open the file from another location on your file system,

If the file does not exist or you specified the wrong location, you’ll get FileNotFoundError exception:

Traceback (most recent call last):
  File "files.py", line 1, in <module>
    with open('/wrong/path/hands-on-cloud.txt') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/wrong/path/hands-on-cloud.txt'

Python file open modes

Access modes determine the type of access(write, read, or read-write) and type of file (text or binary).

The common access_modes are:

  • r – Default mode, opens a text file for reading.
  • w – Opens a text file for writing.
    • If the file does not exist then it creates a new one.
    • If the file already exists then it truncates it.
  • a – Opens a text file to append text to the end of file.
    • If the file does not exist then it create the new one.
    • If the file exists then it does not truncates it.
  • + – Allows to open the file for reading and writing.
  • b – Opens the file in binary mode.
  • rb – Opens the file in binary mode for reading.
  • wb – Open the file in binary mode for writing.
  • ab – Same as a in text mode, but for binary mode.

Here are some examples:

# open file "file.txt" to read
file_obj = open("file.txt", 'r')

# open file "file.txt" to write
file_obj = open("file.txt", 'w')

# open file "file.txt" read and write
file_obj = open("file.txt", 'r+')

# open file "test.bmp" in binary mode to read
file_obj = open("test.bmp", 'rb')

Closing a File in Python

After performing the actions with the file, we have to close it. Python uses the close() function of the handler to close the file. Closing files after opening them releases the Python interpreter buffer space. That might be important when you need to process multiple files. There are several ways to close files.

Explicit file close operation in Python

In this method, you’re closing the file by explicitly calling close() method from the file handler.

# open file 'file.txt' to read
file_obj = open('file.txt')

# performing file operations

# explicit file closing
file_obj.close()

The code example above does not guarantee that Python will close the file because the exception may arise during file operations.

Another approach is to use the try-catch-finally block as it guarantees that Python will close the file even if an exception happens.

file_obj = None
try:
    # open file to read
    file_obj = open('file.txt')

    # perform file operations
finally:
    # close the file
    file_obj.close()

Implicit file close operation in Python

we can simplify the code block above by using the with statement that wraps the executed block with methods defined by a ContextManager. This allows you to encapsulate and reuse try-catch-finally pattern.

with open('file.txt') as f:
    print(f.read())

Additional information on the topic is available in Python Coding Tip: Using the ‘With’ Statement Instead ‘Try…Finally’ article.

Reading files in Python

After opening the file in Python, we can read data from the file. There are several functions in Python that read the file content in different ways:

  • read() – reads entier file content
  • readline() – read single line from the file
  • readlines() – read all lines from the file into an array

Let’s review them one by one.

Reading text file to a string in Python

To read a file’s contents, call read(size) method from the file handler object, which reads some quantity of data and returns it as a string in text mode or bytes objects in binary access mode. size is an optional numeric argument, which determines how much data to read (reads the entire file if omitted or negative).

Here is the example:

with open('file.txt') as f:
    file_content = f.read()
    print(file_content)

In the example above, we’re trying to read an existing text file and read it entirely to a data variable and then print it.

Reading a single line from a text file in Python

To read a single line from the file, you need to use the readline() method from the file handler. Pay attention that this method determines the end of the line by the special \n (newline) character. The returned result will contain \n character.

with open('file.txt') as f:
    line = f.readline()
    print(line)

The example output:

Line 1
      <==== Additional new line is printed here

If you’d like to remove \n character from the line, you need to call the strip() function:

with open('file.txt') as f:
    line = f.readline()
    print(line.strip())

Reading text file to an array (list of strings) in Python

The readlines() file handler method allows you to read the file entirely (real all lines at once) and return all lines as a list object.

Here is the example:

with open('file.txt') as f:
    lines = f.readlines()
    for line in lines:
        print(line)

In the example above, we’re reading the entire file to a list object.

Note: Python will load the whole file content into memory.

Note: every element of the lines array will contain \n character.

Reading text file line by line in Python

A more memory-efficient way of reading text files is doing it line by line. You can do it by applying a for loop to the file handler object:

with open('file.txt') as f:
    for line in f:
        print(line)

Note: the content of the line variable will contain \n character.

Reading binary files in Python

Now, we will see how we can read binary files in Python.

with open('file.pdf', 'rb') as f:
    line = f.read()
    print(line)

Here’s how the code execution output going to look like:

b'%PDF-1.4\n%\xb5\xed\xae\xfb\n4 0 obj\n<< /Length 5 0 R\n   /Filter /FlateDecode\n>>\nstream\nx\x9cER[\x8e\x1cA\x08\xfb\xafSp\x81\xb0@=\xa8:F\x8e\x10\xb5\x94\xcd\xc7\xcc\xc7&\xf7\x97b\xd3\xb3\xbb\x9aQ\x81\xe9.cL\xbb\x18~?\x1c\xc74\xb9\x9e\xed\xa3Y\x95\xfe\xbe\xcb\xdb/\x93\xf7\x7f-L\xb6<\xc5\xbb\xce~\x90n\xed+\t\xdd\x00\xd5\xce\x10?:\xe5\x92\xae\xc3\xb7\x84if\x82"\x86\x9e\xb3\x90t0\x0bx\xfbR\x1b)(\xcf.\xc3\xc5\x9d\xe7\xc5\x03\xff\x87\x8c\xa9gL\xe6\xb3^&`\x16x\x0716Z\x86\xb4\x91:\x10A\x9f\x07<]\x8f-\x89\xae+(\x02x\x9d)\x9e\x9a+\xa4o=h\xd4\xa5w\x9e\xd0\xe8z\xd6]1\xc8\x8b\xc6IbnB\xeb~\x8fIQ\x91\xba7.\x99\xc6\xaaf\xe1\x9c\xbe\xdc\xb8\xe4\x0fmq\x83/h\x9c\x91\xccc\x91P<\xca\x13\xa8M\xdc\xf0Y\x08|G}.\xe2\xed\x81a\x81O\xdf\xd2P`\xc3G\xe9A\x01\xd8\xd7.\x9dP3Kx\x99\x9be\x1b\xf2\x114\xbc\x9c\xa40\xce\x8cX}\x1a]\x0c\xcd\xed,\xc4~\xb9\x9a\xb8\x05|\x12v\xedR\x96L\xcaXFh\x9a\xe0\x1d\x15\xb0\xe4\xe1\x8d\xe1\xe2\x86\x10\x1fr/\x0e)\x1c\xbd\xd7\x17\xf7V\xab\xf1\xca\x10\xb4\x9fx\xec\xb5r\x8c\xba7\x01B#\x8d\x11X\x8d\xc9\xdc\xee\xe8CG\x1f_\xa1\x1e\xb8\xdd\xee\x0e\x82\'\xc7\x83\x13s\xd11\x0cMG\xb1\x98{0\xdf\x95\xbc\x8c~\xd4\x8e\x80!\xf0\x85\xa74\x8c6\xf0u\x10T\xf5\xeb!X\x8b\xec\xf32\xdc/\xb2O\xf2X\xdf\x8d\x91\xb7\xd2\xfc\xad\xebw\xfb\xd9\xfe\x03@\x1c\x9e\xdf\nendstream\nendobj\n5 0 obj\n   402\nendobj\n3 0 obj\n<<\n   /ExtGState <<\n      /a0 << /CA 1 /ca 1 >>\n   >>\n>>\nendobj\n2 0 obj\n<< /Type /Page % 1\n   /Parent 1 0 R\n   /MediaBox [ 0 0 50 50 ]\n   /Contents 4 0 R\n   /Group <<\n      /Type /Group\n      /S /Transparency\n      /I true\n      /CS /DeviceRGB\n   >>\n   /Resources 3 0 R\n>>\nendobj\n1 0 obj\n<< /Type /Pages\n   /Kids [ 2 0 R ]\n   /Count 1\n>>\nendobj\n6 0 obj\n<< /Producer (cairo 1.16.0 (https://cairographics.org))\n   /CreationDate (D:20210720174858Z)\n>>\nendobj\n7 0 obj\n<< /Type /Catalog\n   /Pages 1 0 R\n>>\nendobj\nxref\n0 8\n0000000000 65535 f \n0000000804 00000 n \n0000000588 00000 n \n0000000516 00000 n \n0000000015 00000 n \n0000000494 00000 n \n0000000869 00000 n \n0000000980 00000 n \ntrailer\n<< /Size 8\n   /Root 7 0 R\n   /Info 6 0 R\n>>\nstartxref\n1032\n%%EOF\n'

Writing files in Python

You can use several methods to write data to a file in Python:

  • write(string) – writes a text string to a text file
  • write(binary_string) – saves a binary string to a text file
  • writelines(strings_list) – writes a text strings to the file from the strings list array

Let’s use those methods one by one to write information to the file.

Writing single string line to a file in Python

In this method, we will learn how we can write a single line to a file. It will be done by first opening the file and then using the write() method. Here is the example:

with open('file.txt', 'w') as f:
    f.write('Hands-On.Cloud!\n')
    f.write('Hands-On.Cloud!\n')

The code above writes two strings Hands-On.Cloud! to the file.txt file.

Note: you have to add \n character to the end of each string to write it as a new line.

Writing strings to a file from an array in Python

This method allows us to write multiple lines from a single string array into a file:

with open('file.txt', 'w') as f:
    lines = [
        'I am writing first line. This is still first line.\n',
        'Now, I am on second line.\n',
        'This is 3rd\n'
    ]

    f.writelines(lines)

Note: you have to add \n character to the end of each string in the array to write its elements at a new line.

Here’s the result content in the file.txt file:

I am writing first line. This is still first line.
Now, I am on second line.
This is 3rd

Appending a strings to a file in Python

By default, the w access mode is overriding the file if it already exists. To append text to the file, you have to use the a access mode:

with open('file.txt', 'a') as f:
    lines = [
        'Appended line 1\n',
        'Appended line 2\n'
    ]

    f.writelines(lines)

Note: you have to add \n character to the end of each string in the array to write its elements at a new line.

As soon as the file.txt file already exists, execution of the code above will append two additional lines to it:

I am writing first line. This is still first line.
Now, I am on second line.
This is 3rd
Appended line 1
Appended line 2

Moving file cursor position in Python

Two methods exist in the file object that allows you to manage cursor position during file operations in Python: seek() and tell():

  • seek() method sets the position of a file pointer
  • tell() method returns the current position of a file pointer

Let say you have the file.txt file with those two lines in it:

The seek() and tell() example
From hands-on.cloud

The seek() method in Python

The seek() method has the following syntax:

seek(offset, whence)

The parameters description is the following:

  • offset – changes cursor position to a positive or negative number of bytes
  • whence – tells Python which cursor location to use as a starting point
    • 0 – beggining of the file (the default); offset should be zero or positive
    • 1 – current cursor location; offset may be positive or negative
    • 2 – end of the file; offset should be negative

Note: pay attention that the seek() method supports offset from the current stream (whence=1) and the end (whence=2) of the stream only in binary mode. For the file opened in the text access mode, you’ll get the following error message: io.UnsupportedOperation: can't do nonzero cur-relative seeks.

Here’s a couple of the seek() usage examples:

The seek() exampleExplanation
f.seek(0)Move file cursor to the beginning of a file
f.seek(0, 2)Move file cursor to the end of of a file
f.seek(10)Move file pointer ten characters ahead from the beginning of a file
f.seek(10, 1)Move file pointer ten characters ahead from the current position.
f.seek(-3, 1)Move file pointer three characters behind from the current position.
f.seek(-10, 2)Move file cursor ten characters before the end of the file
Seek usage examples in Python

Let’s illustrate the results if seek() execution:

with open('file.txt', 'rb') as f:

    # Move file cursor to the beginning of a line
    print(f'Current cursor position: {f.tell()}')
    f.seek(0)
    print(f'seek(0) & readline() output: {f.readline()}\n')

    # Move file cursor to the end of a file
    print(f'Current cursor position: {f.tell()}')
    f.seek(0, 2)
    print(f'seek(0, 2) & readline() output: {f.readline()}\n')

    # Move file pointer ten characters ahead
    # from the beginning of a file.
    print(f'Current cursor position: {f.tell()}')
    f.seek(10)
    print(f'seek(10) & readline() output: {f.readline()}\n')

    # Move file pointer ten characters ahead
    # from the current position.
    print(f'Current cursor position: {f.tell()}')
    f.seek(10, 1)
    print(f'seek(10, 1) & readline() output: {f.readline()}\n')

    # Move file pointer three characters behind
    # from the current position.
    print(f'Current cursor position: {f.tell()}')
    f.seek(-3, 1)
    print(f'seek(-3, 1) & readline() output: {f.readline()}\n')

    # Move file cursor ten characters before
    # the end of the file
    print(f'Current cursor position: {f.tell()}')
    f.seek(-10, 2)
    print(f'seek(-10, 2) & readline() output: {f.readline()}\n')

The expected output will be:

Current cursor position: 0
seek(0) & readline() output: b'The seek() and tell() example\n'

Current cursor position: 30
# This is the end of the first line
# That's why the output of readline() is empty
seek(0, 2) & readline() output: b''

Current cursor position: 49
seek(10) & readline() output: b' and tell() example\n'

Current cursor position: 30
seek(10, 1) & readline() output: b'-on.cloud'

Current cursor position: 49
seek(-3, 1) & readline() output: b'oud'

Current cursor position: 49
seek(-10, 2) & readline() output: b's-on.cloud'

The tell() method in Python

As you’ve probably mentioned from the previous example, the tell() method returns the current cursor position in Python:

with open('file.txt') as f:
    # The cursor is at the beginning of a file
    pos = f.tell()
    print(f'Current cursor position: {pos}')

    # Move file pointer ten characters ahead
    # from the beginning of a file.
    f.seek(10)

    pos = f.tell()
    print(f'Current cursor position: {pos}')

Example of the script output:

Current cursor position: 0
Current cursor position: 10

FAQ

How to create a file in Python

Python file handler object open() method with access mode w or x allows creating an empty file:

# create or recreate an empty file
with open('file.txt', 'w') as f:
    pass

# create an empty file
# if the file exists, this call will
# generate FileExistsError exception
with open('file.txt', 'x') as f:
    pass

How to read and write files in memory in Python

One of the commonly asked questions is how to read or write files in memory in Python. To do that, you have to use the StringIO Python module, which supports the same methods as the file handler object:

import io

with io.StringIO() as f:
    f.write('First line.\n')
    f.write('Second line.\n')

    f.seek(0)
    line = f.readline()
    print(f'Line: {line}')

    f.seek(0)
    lines = f.readlines()
    print(f'All lines: {lines}')

    f.seek(0)
    contents = f.read()
    print(f'\nFull file content:\n{contents}')

The expected output will look like this:

Line: First line.

All lines: ['First line.\n', 'Second line.\n']

Full file content:
First line.
Second line.

How to print to file in Python

In Python, It is possible to use the print() method to print the data to the file. To do this, you have to use its file argument:

import io

with io.StringIO() as f:
    print('First line.\n', file=f)
    print('Second line.\n', file=f)

    f.seek(0)
    line = f.read()
    print(f'Line: {line}')

    f.seek(0)
    lines = f.readlines()
    print(f'All lines: {lines}')

    # Retrieve file contents -- this will be
    # 'First line.\nSecond line.\n'
    f.seek(0)
    contents = f.read()
    print(f'\nFull file content:\n{contents}')

How to read and write JSON files in Python

Python provides a useful json module that allows you to work with JSON data. You can use this module to read and write JSON data to and from files:

import json

data = {
    'products': [
        {'id': 1, 'product_name': 'Pen'},
        {'id': 2, 'product_name': 'Table'},
        {'id': 3, 'product_name': 'Laptop'},
    ]
}

# writing JSON data to file
with open('file.json', 'w') as f:
    json.dump(data, f)

# reading JSON data from file
with open('file.json', 'r') as f:
    read_data = json.load(f)

print(f'Read data: {read_data}')

We’ve saved the Python dictionary as JSON data to the file and read it back in the example above.

How to read and write CSV files in Python

Python provides a useful csv module that allows you to process with CSV data. You can use this module to read and write CSV data to and from files:

import csv

data = [
    ['id', 'product_name'],
    [1, 'Pen'],
    [2, 'Table'],
    [3, 'Laptop']
]

# writing JSON data to file
with open('file.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerows(data)

read_data = []
# reading JSON data from file
with open('file.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        read_data.append(row)

print(f'Read data: {read_data}')

How to store and retrieve any type of data in Python

To store and retrieve any data in Python between your program executions, you can use the pickle module:

import pickle


class Person():
    def __init__(self, age, name):
        self.age = age
        self.name = name

    def __str__(self):
        return f'class Person({self.age}, {self.name})'


person_1 = Person(35, 'Joe')
person_2 = Person(42, 'Jane')

data = [
    person_1, person_2
]

# persisting data to disk
with open('file.pickle', 'wb') as f:
    pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)


# reading data from disk
read_data = []
with open('file.pickle', 'rb') as f:
    read_data = pickle.load(f)

for p in read_data:
    print(f'{p}')

In the example above, we did the following:

  • Defined a custom object (Person) as Python class
  • Instantiated class objects (person_1 and person_2)
  • Added clas objects to a Python list
  • Persisted (saved) a list to .pickle file
  • Loaded list of class objects from the .pickle file

You can use this approach to load and save any data in Python.

Summary

This article covered how to use Python to perform read and write operations to text, CSV, JSON, and binary files, including persisting custom objects data.

How useful was this post?

Click on a star to rate it!

Average rating 5 / 5. Vote count: 1

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

Subscribe to our updates

Like this article?

Share on facebook
Share on Facebook
Share on twitter
Share on Twitter
Share on linkedin
Share on Linkdin
Share on pinterest
Share on Pinterest

Want to be an author of another post?

We’re looking for skilled technical authors for our blog!

Leave a comment

If you’d like to ask a question about the code or piece of configuration, feel free to use https://codeshare.io/ or a similar tool as Facebook comments are breaking code formatting.