If you want to convert a CSV file into Pandas, you can use [pandas.read_csv][readcsv]
.
The function takes several options. One of them is sep
(default value is ,
).
You can use a regular expression to customize the delimiter.
Let’s say your data looks like this:
vhigh,high,2,2,more,small
med,vhigh,3,more,big
...
You want to load that data into a Pandas DataFrame. You can split each line on the comma, but you want to ignore the comma inside floating point numbers like 2.2
.
pd.read_csv("../path/to/file.csv", sep="(?<=\D)\,|\,(?=\D)", engine="python");
That’s quite a complicated regular expression which uses group constructs.
But Pandas is able to handle it. Use engine="python"
for the parser engine.
The default parser uses C. It’s faster, but not as feature-complete.
You’ll get a result similar to this:
0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|
vhigh | high | 2,2 | more | small |
med | vhigh | 3 | more | big |