TIL: Pandas - Read CSV With Custom Separator Using Regex

If you want to convert a CSV file into Pandas, you can use [pandas.read_csv][readcsv].

The function takes several options. One of them is sep (default value is ,).

You can use a regular expression to customize the delimiter.

Let’s say your data looks like this:

vhigh,high,2,2,more,small
med,vhigh,3,more,big
...

You want to load that data into a Pandas DataFrame. You can split each line on the comma, but you want to ignore the comma inside floating point numbers like 2.2.

pd.read_csv("../path/to/file.csv", sep="(?<=\D)\,|\,(?=\D)", engine="python");

That’s quite a complicated regular expression which uses group constructs.
But Pandas is able to handle it. Use engine="python" for the parser engine.
The default parser uses C. It’s faster, but not as feature-complete.

You’ll get a result similar to this:

0	1	2	3	4
vhigh	high	2,2	more	small
med	vhigh	3	more	big

TIL: Pandas - Read CSV With Custom Separator Using Regex

Further Reading