结构化文本提取器 API API ID: 5662

通过一个旨在从任何网页提取干净文本或Markdown的API来简化内容操作

通过 MCP 从您的 AI 代理使用此 API

支持 OpenClaw、Claude Code/Desktop、Cursor、Windsurf、Cline 以及任何兼容 MCP 的 AI 客户端。

文档和设置

通过封装此 MCP 创建技能： https://mcp.zylalabs.com/mcp?apikey=YOUR_ZYLA_API_KEY

关于API：

结构化文本提取API提供了一种简单的方法来提取和转换网页内容。它的清晰文本端点提供无格式的可读内容，非常适合文本分析或简化呈现。Markdown端点更进一步，生成结构化的markdown，理想用于与支持markdown的工具或系统集成。该API支持多种网页类型，确保可靠的性能和适应性，适用于各种应用，使其在内容解析和转换中不可或缺。

API 文档

端点

错误 Endpoint ID: 7373

要使用此端点，请发送包含网页 URL 的请求，并接收从该页面的内容中提取的干净文本

                                                                            
POST https://pr148-testing.zylalabs.com/api/5662/structured+text+extractor+api/7373/markdownify+api

错误 - 端点功能

对象	描述
`请求体`	[必需] Json

请求体

测试端点

API 示例响应

       
                                                                                                        
                                                                                                                                                                                                                                                                                                                                        {"response":"Spark Basics\nSuppose we have a web application hosted in an application orchestrator like kubernetes. If load in that particular application increases then we can horizontally scale our application simply by increasing the number of pods in our service.\nNow let’s suppose there is heavy compute operation happening in each of the pods. Then there will be certain limit upto which these services can run because unlike horizontal scaling where you can have as many numbers of machines as required, there is limit for vertical scaling because you can’t have unlimited ram and cpu cores for each of the machines in a cluster. Distributed Computing removes this limitation of vertical scaling by distributing the processing across cluster of machines. Now, a group of machines alone is not powerful, you need a framework to coordinate work across them. Spark does just that, managing and coordinating the execution of tasks on data across a cluster of computers. The cluster of machines that Spark will use to execute tasks is managed by a cluster manager like Spark’s standalone cluster manager, Kubernetes, YARN, or Mesos.\nSpark Basics\nSpark is distributed data processing engine. Distributed data processing in big data is simply series of map and reduce functions which runs across the cluster machines. Given below is python code for calculating the sum of all the even numbers from a given list with the help of map and reduce functions.\nfrom functools import reduce\na = [1,2,3,4,5]\nres = reduce(lambda x,y: x+y, (map(lambda x: x if x%2==0 else 0, a)))\nNow consider, if instead of a simple list, it is a parquet file of size in order of gigabytes. Computation with MapReduce system becomes optimized way of dealing with such problems. In this case spark will load the big parquet file into multiple worker nodes (if the file doesn’t support distributed storage then it will be first loaded into driver node and afterwards, it will get distributed across the worker nodes). Then map function will be executed for each task in each worker node and the final result will fetched with the reduce function.\nSpark timeline\nGoogle was first to introduce large scale distributed computing solution with MapReduce and its own distributed file system i.e., Google File System(GFS). GFS provided a blueprint for the Hadoop File System (HDFS), including the MapReduce implementation as a framework for distributed computing. Apache Hadoop framework was developed consisting of Hadoop Common, MapReduce, HDFS, and Apache Hadoop YARN. There were various limitations with Apache Hadoop like it fell short for combining other workloads such as machine learning, streaming, or interactive SQL-like queries etc. Also the results of the reduce computations were written to a local disk for subsequent stage of operations. Then came the Spark. Spark provides in-memory storage for intermediate computations, making it much faster than Hadoop MapReduce. It incorporates libraries with composable APIs for machine learning (MLlib), SQL for interactive queries (Spark SQL), stream processing (Structured Streaming) for interacting with real-time data, and graph processing (GraphX).\nSpark Application\nSpark Applications consist of a driver process and a set of executor processes. The driver process runs your main() function, sits on a node in the cluster. The executors are responsible for actually carrying out the work that the driver assigns them. The driver and executors are simply processes, which means that they can live on the same machine or different machines.\nThere is a SparkSession object available to the user, which is the entrance point to running Spark code. When using Spark from Python or R, you don’t write explicit JVM instructions; instead, you write Python and R code that Spark translates into code that it then can run on the executor JVMs.\nSpark’s language APIs make it possible for you to run Spark code using various programming languages like Scala, Java, Python, SQL and R.\nSpark has two fundamental sets of APIs: the low-level “unstructured” APIs (RDDs), and the higher-level structured APIs (Dataframes, Datasets).\nSpark Toolsets\nA DataFrame is the most common Structured API and simply represents a table of data with rows and columns. To allow every executor to perform work in parallel, Spark breaks up the data into chunks called partitions. A partition is a collection of rows that sit on one physical machine in your cluster.\nIf a function returns a Dataframe or Dataset or Resilient Distributed Dataset (RDD) then it is a transformation and if it doesn’t return anything then it’s an action. An action instructs Spark to compute a result from a series of transformations. The simplest action is count.\nTransformation are of types narrow and wide. Narrow transformations are those for which each input partition will contribute to only one output partition. Wide transformation will have input partitions contributing to many output partitions.\nSparks performs a lazy evaluation which means that Spark will wait until the very last moment to execute the graph of computation instructions. This provides immense benefits because Spark can optimize the entire data flow from end to end.\nSpark-submit\nReferences\n- https://spark.apache.org/docs/latest/\n- spark: The Definitive Guide by Bill Chambers and Matei Zaharia"}

错误 - 代码片段


curl --location --request POST 'https://zylalabs.com/api/5662/structured+text+extractor+api/7373/markdownify+api' --header 'Authorization: Bearer YOUR_API_KEY' 

--data-raw '{
  "url": "https://techtalkverse.com/post/software-development/spark-basics/"
}'

干净页面转换器 Endpoint ID: 7374

要使用此端点，请发送一个包含网页 URL 的请求，并接收该页面转换为 markdown 格式的内容

                                                                            
POST https://pr148-testing.zylalabs.com/api/5662/structured+text+extractor+api/7374/clean+page+converter

干净页面转换器 - 端点功能

对象	描述
`请求体`	[必需] Json

请求体

测试端点

API 示例响应

       
                                                                                                        
                                                                                                                                                                                                                                                                                                                                        {"response":"---\ntitle: Spark Basics\nurl: https://techtalkverse.com/post/software-development/spark-basics/\nhostname: techtalkverse.com\ndescription: Suppose we have a web application hosted in an application orchestrator like kubernetes. If load in that particular application increases then we can horizontally scale our application simply by increasing the number of pods in our service.\nsitename: techtalkverse.com\ndate: 2023-05-01\ncategories: ['post']\n---\n# Spark Basics\n\nSuppose we have a web application hosted in an application orchestrator like kubernetes. If load in that particular application increases then we can horizontally scale our application simply by increasing the number of pods in our service.\n\nNow let’s suppose there is heavy compute operation happening in each of the pods. Then there will be certain limit upto which these services can run because unlike horizontal scaling where you can have as many numbers of machines as required, there is limit for vertical scaling because you can’t have unlimited ram and cpu cores for each of the machines in a cluster. **Distributed Computing** removes this limitation of vertical scaling by distributing the processing across cluster of machines.\nNow, a group of machines alone is not powerful, you need a framework to\ncoordinate work across them. Spark does just that, managing and coordinating the execution of tasks on data across a cluster of computers. The cluster of machines that Spark will use to execute tasks is managed by a cluster manager like Spark’s standalone cluster manager, Kubernetes, YARN, or Mesos.\n\n## Spark Basics\n\nSpark is distributed data processing engine. Distributed data processing in big data is simply series of map and reduce functions which runs across the cluster machines. Given below is python code for calculating the sum of all the even numbers from a given list with the help of map and reduce functions.\n\n```\nfrom functools import reduce\na = [1,2,3,4,5]\nres = reduce(lambda x,y: x+y, (map(lambda x: x if x%2==0 else 0, a)))\n```\n\n\nNow consider, if instead of a simple list, it is a parquet file of size in order of gigabytes. Computation with MapReduce system becomes optimized way of dealing with such problems. In this case spark will load the big parquet file into multiple worker nodes (if the file doesn’t support distributed storage then it will be first loaded into driver node and afterwards, it will get distributed across the worker nodes). Then map function will be executed for each task in each worker node and the final result will fetched with the reduce function.\n\n## Spark timeline\n\nGoogle was first to introduce large scale distributed computing solution with **MapReduce** and its own distributed file system i.e., **Google File System(GFS)**. GFS provided a blueprint for the **Hadoop File System (HDFS)**, including the MapReduce implementation as a framework for distributed computing. **Apache Hadoop** framework was developed consisting of Hadoop Common, MapReduce, HDFS, and Apache Hadoop YARN. There were various limitations with Apache Hadoop like it fell short for combining other workloads such as machine learning, streaming, or interactive SQL-like queries etc. Also the results of the reduce computations were written to a local disk for subsequent stage of operations. Then came the **Spark**. Spark provides in-memory storage for intermediate computations, making it much faster than Hadoop MapReduce. It incorporates libraries with composable APIs for\nmachine learning (MLlib), SQL for interactive queries (Spark SQL), stream processing (Structured Streaming) for interacting with real-time data, and graph processing (GraphX).\n\n## Spark Application\n\n**Spark Applications** consist of a driver process and a set of executor processes. The **driver** process runs your main() function, sits on a node in the cluster. The **executors** are responsible for actually carrying out the work that the driver assigns them. The driver and executors are simply processes, which means that they can live on the same machine or different machines.\n\nThere is a **SparkSession** object available to the user, which is the entrance point to running Spark code. When using Spark from Python or R, you don’t write explicit JVM instructions; instead, you write Python and R code that Spark translates into code that it then can run on the executor JVMs.\n**Spark’s language APIs** make it possible for you to run Spark code using various programming languages like Scala, Java, Python, SQL and R.\nSpark has two fundamental sets of APIs: the **low-level “unstructured” APIs** (RDDs), and the **higher-level structured APIs** (Dataframes, Datasets).\n\n## Spark Toolsets\n\nA **DataFrame** is the most common Structured API and simply represents a table of data with rows and columns. To allow every executor to perform work in parallel, Spark breaks up the data into chunks called partitions. A **partition** is a collection of rows that sit on one physical machine in your cluster.\n\nIf a function returns a Dataframe or Dataset or Resilient Distributed Dataset (RDD) then it is a **transformation** and if it doesn’t return anything then it’s an **action**. An action instructs Spark to compute a result from a series of transformations. The simplest action is count.\n\nTransformation are of types narrow and wide. **Narrow transformations** are those for which each input partition will contribute to only one output partition. **Wide transformation** will have input partitions contributing to many output partitions.\n\nSparks performs a **lazy evaluation** which means that Spark will wait until the very last moment to execute the graph of computation instructions. This provides immense benefits because Spark can optimize the entire data flow from end to end.\n\n## Spark-submit\n\n## References\n\n- https://spark.apache.org/docs/latest/\n- spark: The Definitive Guide by Bill Chambers and Matei Zaharia"}

干净页面转换器 - 代码片段


curl --location --request POST 'https://zylalabs.com/api/5662/structured+text+extractor+api/7374/clean+page+converter' --header 'Authorization: Bearer YOUR_API_KEY' 

--data-raw '{
  "url": "https://techtalkverse.com/post/software-development/spark-basics/"
}'

API 访问密钥和身份验证

注册后，每个开发者都会被分配一个个人 API 访问密钥，这是一个唯一的字母和数字组合，用于访问我们的 API 端点。要使用结构化文本提取器 API 进行身份验证，只需在 Authorization 标头中包含您的 bearer token。

标头

标头	描述
`授权`	[必需] 应为 `Bearer access_key`. 订阅后，请查看上方的"您的 API 访问密钥"。

问题

简单透明的定价

无长期承诺。随时升级、降级或取消。免费试用包括最多 50 个请求。

月度年度

(年度计费可节省 2 个月 🎉)

💫Basic

$24.99/月

2,000 请求 / 月
然后 $0.0162435 如果超过限制，每次请求
速率限制: 60 reqs 每分钟
专业客户支持
实时 API 监控
包含无限数据传输

免费 7-天试用

无承诺。随时取消

Popular

⚡Pro

$49.99/月

5,000 请求 / 月
然后 $0.0162435 如果超过限制，每次请求
速率限制: 60 reqs 每分钟
专业客户支持
实时 API 监控
包含无限数据传输

免费 7-天试用

无承诺。随时取消

🔥Pro Plus

$99.99/月

15,000 请求 / 月
然后 $0.0162435 如果超过限制，每次请求
速率限制: 120 reqs 每分钟
专业客户支持
实时 API 监控
包含无限数据传输

免费 7-天试用

无承诺。随时取消

⚜️Premium

$199.99/月

40,000 请求 / 月
然后 $0.0162435 如果超过限制，每次请求
速率限制: 120 reqs 每分钟
专业客户支持
实时 API 监控
包含无限数据传输

免费 7-天试用

无承诺。随时取消

🌟Elite

$499.99/月

100,000 请求 / 月
然后 $0.0162435 如果超过限制，每次请求
速率限制: 240 reqs 每分钟
专业客户支持
实时 API 监控
包含无限数据传输

免费 7-天试用

无承诺。随时取消

💎Ultimate

$999.99/月

300,000 请求 / 月
然后 $0.0162435 如果超过限制，每次请求
速率限制: 240 reqs 每分钟
专业客户支持
实时 API 监控
包含无限数据传输

免费 7-天试用

无承诺。随时取消

💫Basic

$20.83/月

2,000 请求 / 月
然后 $0.0162435 如果超过限制，每次请求
速率限制: 60 reqs 每分钟
专业客户支持
实时 API 监控
包含无限数据传输

免费 7-天试用

无承诺。随时取消

Popular

⚡Pro

$41.66/月

5,000 请求 / 月
然后 $0.0162435 如果超过限制，每次请求
速率限制: 60 reqs 每分钟
专业客户支持
实时 API 监控
包含无限数据传输

免费 7-天试用

无承诺。随时取消

🔥Pro Plus

$83.33/月

15,000 请求 / 月
然后 $0.0162435 如果超过限制，每次请求
速率限制: 120 reqs 每分钟
专业客户支持
实时 API 监控
包含无限数据传输

免费 7-天试用

无承诺。随时取消

⚜️Premium

$166.66/月

40,000 请求 / 月
然后 $0.0162435 如果超过限制，每次请求
速率限制: 120 reqs 每分钟
专业客户支持
实时 API 监控
包含无限数据传输

免费 7-天试用

无承诺。随时取消

🌟Elite

$416.66/月

100,000 请求 / 月
然后 $0.0162435 如果超过限制，每次请求
速率限制: 240 reqs 每分钟
专业客户支持
实时 API 监控
包含无限数据传输

免费 7-天试用

无承诺。随时取消

💎Ultimate

$833.33/月

300,000 请求 / 月
然后 $0.0162435 如果超过限制，每次请求
速率限制: 240 reqs 每分钟
专业客户支持
实时 API 监控
包含无限数据传输

免费 7-天试用

无承诺。随时取消

🚀 企业版

起价
$ 10,000/年

自定义数量
自定义速率限制
专业客户支持
实时 API 监控

预约通话

客户喜爱的功能

✔︎ 仅支付成功请求
✔︎ 7 天免费试用
✔︎ 多语言支持
✔︎ 一个 API 密钥，所有 API。
✔︎ 直观的仪表板

✔︎ 全面的错误处理
✔︎ 开发者友好的文档
✔︎ Postman 集成
✔︎ 安全的 HTTPS 连接
✔︎ 可靠的正常运行时间

结构化文本提取器 API FAQs

什么是结构化文本提取器API

结构化文本提取器API是一个工具，旨在从任何网页中提取干净的文本或Markdown，简化各种应用程序的内容处理

API可以提取什么类型的内容

该API可以提取未经格式化的可读内容以及结构化的markdown，使其适合文本分析和与markdown兼容工具的集成

干净文本端点是如何工作的

干净文本端点提供来自网页的未格式化内容，确保提取的文本可读且适合分析或演示，没有任何HTML标签

使用markdown端点有什么好处

该markdown端点生成结构化的markdown，适用于需要将提取的内容与markdown兼容系统集成的用户，从而增强可用性和格式化

API能处理多种网页类型吗

是的，结构化文本提取器 API 支持多种网页类型，确保在各种内容解析和转换需求中具有可靠的性能和适应性

每个端点返回什么类型的数据

清洁文本端点返回从网页中提取的未格式化的可读文本，而 markdown 端点返回结构化的 markdown，包括标题、网址和描述等元数据，以及格式化的内容

响应数据中的关键字段是什么

干净的文本响应包含提取的文本，而markdown响应则包括标题 URL 主机名描述日期类别和以markdown格式整理的主要内容

响应数据是如何组织的

干净的文本响应是一个简单的字符串，而markdown响应则结构化为一个包含键值对的JSON对象，方便访问特定的元数据和内容

每个端点可以获取哪些类型的信息

干净文本端点提供用于分析的纯文本，而markdown端点提供带有元数据的详细内容，使其适用于文档、博客和内容管理系统

可以使用哪些参数与端点配合

两个端点的主要参数是提取内容的网页的URL 用户可以通过提供不同的URL来自定义请求以定向特定内容

用户如何有效利用返回的数据

用户可以分析干净的文本以获取见解或使用markdown输出创建格式化文档与内容管理系统集成或增强支持markdown的web应用程序

如何保持数据准确性

该API使用强大的解析算法从网页中准确提取内容，确保返回的文本和markdown尽可能地反映原始内容

此数据的典型使用案例是什么

常见的用例包括内容聚合文本分析博客文章创作以及将网页内容转换为markdown格式用于文档或在markdown兼容的平台上发布

一般常见问题

什么是 Zyla API Hub？

Zyla API Hub 就像一个大型 API 商店，您可以在一个地方找到数千个 API。我们还为所有 API 提供专门支持和实时监控。注册后，您可以选择要使用的 API。请记住，每个 API 都需要自己的订阅。但如果您订阅多个 API，您将为所有这些 API 使用相同的密钥，使事情变得更简单。

价格以 USD（美元）、EUR（欧元）、CAD（加元）、AUD（澳元）和 GBP（英镑）列出。我们接受所有主要的借记卡和信用卡。我们的支付系统使用最新的安全技术，由 Stripe 提供支持，Stripe 是世界上最可靠的支付公司之一。如果您在使用卡片付款时遇到任何问题，请通过 [email protected]

此外，如果您已经以这些货币中的任何一种（USD、EUR、CAD、AUD、GBP）拥有有效订阅，该货币将保留用于后续订阅。只要您没有任何有效订阅，您可以随时更改货币。

如果我在定价页面上看到本地货币，为什么不能用它付款？

定价页面上显示的本地货币基于您 IP 地址的国家/地区，仅供参考。实际价格以 USD（美元）为单位。当您付款时，即使您在我们的网站上看到以本地货币显示的等值金额，您的卡片对账单上也会以美元显示费用。这意味着您不能直接使用本地货币付款。

我的付款被拒绝，我该怎么办？

有时，银行可能会因其欺诈保护设置而拒绝收费。我们建议您首先联系您的银行，检查他们是否阻止了我们的收费。此外，您可以访问账单门户并更改关联的卡片以进行付款。如果这些方法不起作用并且您需要进一步帮助，请通过 [email protected]

我的 API 订阅将如何收费？

价格由月度或年度订阅决定，具体取决于所选计划。

我的 API 调用将如何从我的计划中扣除？

API 调用根据成功请求从您的计划中扣除。每个计划都包含您每月可以进行的特定数量的调用。只有成功的调用（由状态 200 响应指示）才会计入您的总数。这确保失败或不完整的请求不会影响您的月度配额。

您的计费周期如何工作？

Zyla API Hub 采用月度订阅系统。您的计费周期将从您购买付费计划的那一天开始，并在下个月的同一日期续订。因此，如果您想避免未来的费用，请提前取消订阅。

如何升级我当前的 API 订阅计划？

要升级您当前的订阅计划，只需转到 API 的定价页面并选择您要升级到的计划。升级将立即生效，让您立即享受新计划的功能。请注意，您之前计划中的任何剩余调用都不会转移到新计划，因此在升级时请注意这一点。您将被收取新计划的全部金额。

如何查看本月我可以进行的剩余 API 调用次数？

要检查您本月剩余多少 API 调用，请参考响应标头中的 "X-Zyla-API-Calls-Monthly-Remaining" 字段。例如，如果您的计划允许每月 1,000 个请求，而您已使用 100 个，则响应标头中的此字段将显示 900 个剩余调用。

如何找出我的订阅计划允许的最大 API 请求数？

要查看您的计划允许的最大 API 请求数，请检查 "X-Zyla-RateLimit-Limit" 响应标头。例如，如果您的计划包括每月 1,000 个请求，此标头将显示 1,000。

如何知道我的速率限制何时重置？

"X-Zyla-RateLimit-Reset" 标头显示您的速率限制重置之前的秒数。这告诉您何时您的请求计数将重新开始。例如，如果它显示 3,600，则意味着还有 3,600 秒直到限制重置。

我可以随时取消吗？

是的，您可以随时通过访问您的账户并在账单页面上选择取消选项来取消您的计划。请注意，升级、降级和取消会立即生效。此外，取消后，您将不再有权访问该服务，即使您的配额中还有剩余调用。

7 天免费试用如何工作？

为了让您有机会在没有任何承诺的情况下体验我们的 API，我们提供 7 天免费试用，允许您免费进行最多 50 次 API 调用。此试用只能使用一次，因此我们建议将其应用于您最感兴趣的 API。虽然我们的大多数 API 都提供免费试用，但有些可能不提供。试用在 7 天后或您进行了 50 次请求后结束，以先发生者为准。如果您在试用期间达到 50 次请求限制，您需要"开始您的付费计划"以继续发出请求。您可以在个人资料中的订阅 -> 选择您订阅的 API -> 定价标签下找到"开始您的付费计划"按钮。或者，如果您在第 7 天之前不取消订阅，您的免费试用将结束，您的计划将自动计费，授予您访问计划中指定的所有 API 调用的权限。请记住这一点以避免不必要的费用。