In [ ]:
from dotenv import load_dotenv

load_dotenv(override=True)
Out[ ]:
True
In [ ]:
import pandas as pd

data = pd.read_csv("data.csv")
data.head()
Out[ ]:
Name Age Gender City Occupation Income
0 Wang Lei 32 Male Beijing Teacher 10000
1 Li Wei Jia 23 Female Shanghai Doctor 20000
2 Zheng Xiao Gang 45 Male Guangzhou Policeman 10000
3 Huang Xiao Ning 28 Male Shenzhen Programmer 15000
4 He Li Na 35 Female Chengdu Lawyer 25000
In [ ]:
import os
from langchain_groq.chat_models import ChatGroq

llm = ChatGroq(
    model_name="mixtral-8x7b-32768",
    api_key=os.environ["GROQ_API_KEY"],
)
In [ ]:
from pandasai import SmartDataframe

df = SmartDataframe(data, config={"llm": llm})

平均收入排名前五的城市是哪些?

In [ ]:
df.chat("What are the top 5 cities for average income?")
Out[ ]:
Income
City
Qingdao 40000.0
Shenyang 37500.0
Xi'an 32500.0
Kunming 31500.0
Chengdu 26600.0

计算每个职业的平均收入,然后将前5个职业从高到低排序

In [ ]:
df.chat(
    "Calculate the average income for each occupation and then rank the top 5 occupations from highest to lowest"
)
<string>:2: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
Out[ ]:
'/Users/chi/Desktop/quartz/content/study/exports/charts/temp_chart.png'
No description has been provided for this image

首先,按平均收入从高到低对城市进行排序,然后创建一个条形图,显示平均收入排名前10位的城市

In [ ]:
df.chat(
    "First, sort the cities by average income from high to low, then create a bar chart displaying the top 10 cities by average income"
)
Out[ ]:
'/Users/chi/Desktop/quartz/content/study/exports/charts/temp_chart.png'
No description has been provided for this image

首先,计算每个职业的人数,然后根据人数创建一个饼状图,列出前5个职业

In [ ]:
df.chat(
    "First, calculate the number of people in each occupation, and then create a pie chart for the top 5 occupations by count"
)
Out[ ]:
'/Users/chi/Desktop/quartz/content/study/exports/charts/temp_chart.png'
No description has been provided for this image