Skip to main content

Command Palette

Search for a command to run...

FLaNK Stack Weekly 05-February-2024

Published
3 min read
FLaNK Stack Weekly 05-February-2024

05-February-2024

FLaNK Stack Weekly

Tim Spann @PaaSDev

https://pebble.is/PaaSDev

https://vimeo.com/flankstack

https://www.youtube.com/@FLaNK-Stack

https://www.threads.net/@tspannhw

https://medium.com/@tspann/subscribe

Get your new Apache NiFi for Dummies!

https://www.cloudera.com/campaign/apache-nifi-for-dummies.html

https://ossinsight.io/analyze/tspannhw

Trial: https://console.us-west-1.cdp.cloudera.com/trial/register.html#/

Building Realtime AI Applications with Apache Flink

CODE + COMMUNITY

Please join my meetup group NJ/NYC/Philly/Virtual.

http://www.meetup.com/futureofdata-princeton/

https://www.meetup.com/futureofdata-newyork/

https://www.meetup.com/futureofdata-philadelphia/

image

This is Issue #123

https://github.com/tspannhw/FLiPStackWeekly

https://www.cloudera.com/solutions/dim-developer.html

Qualified Developers

https://www.linkedin.com/in/satya-n99999/

Articles

NiFi 2.0.0-M2 is Out! https://medium.com/@tspann/apache-nifi-2-0-0-m2-out-314a1d4c8b20

Apache NiFi and Amazon Textract for Machine Learning https://medium.com/@tspann/apache-nifi-and-amazon-textract-for-machine-learning-e45f4af12e68

Apache Kafka: Streams Replication Manager Replication https://blog.cloudera.com/streams-replication-manager-prefixless-replication-part-1/

Doom on Bacteria https://www.rockpapershotgun.com/you-can-play-doom-using-gut-bacteria-but-the-framerate-is-atrocious

Enterprises using Open Source LLM https://venturebeat.com/ai/how-enterprises-are-using-open-source-llms-16-examples/

Flink Deep Dive https://www.waitingforcode.com/apache-flink/apache-flink-cluster-components-deep-dive/read

A Cheat Sheet for RAG https://blog.llamaindex.ai/a-cheat-sheet-and-some-recipes-for-building-advanced-rag-803a9d94c41b

Prompt Engineering Guides https://github.com/dair-ai/Prompt-Engineering-Guide

https://platform.openai.com/docs/guides/prompt-engineering/six-strategies-for-getting-better-results

Hikari Connection Pool https://medium.com/@guptadiksha88/hikari-cp-efficient-database-connection-pooling-d458c0bdf7df

LLM Prompting https://www.infoq.com/articles/large-language-models-llms-prompting/

Incremental Iceberg https://netflixtechblog.com/incremental-processing-using-netflix-maestro-and-apache-iceberg-b8ba072ddeeb

Gen AI Images https://rmoff.net/2023/12/07/productivity-tools-ai-image-generators/

Java Links https://graciano.dev/2023/08/03/weekend-reading-list-187/

IoT with MQTT & NiFi https://www.baeldung.com/iot-data-pipeline-mqtt-nifi

CDC with NiFi and Snowflake https://www.clearpeaks.com/change-data-capture-cdc-with-nifi-and-snowflake/

Host Apache NiFi with Docker https://medium.com/geekculture/host-a-fully-persisted-apache-nifi-service-with-docker-ffaa6a5f54a3

Videos

Seven Videos on Real-Time Streaming https://medium.com/@tspann/seven-videos-on-real-time-streaming-02711320afa8

Unlocking Financial Data with Real-Time Pipelines (OSACon 2023) https://www.youtube.com/watch?v=Q7gF7m4yFi4&ab_channel=OSACon

Processing Cisco ASA Logs with CFM https://medium.com/cloudera-inc/processing-cisco-asa-logs-with-cloudera-flow-management-f09cdf7382c3

Collecting NetFlow Records with Cloudera DataFlow https://medium.com/cloudera-inc/collecting-netflow-records-with-cloudera-dataflow-f47d9f57c98

Events

Feb 8, 2024: NYC. https://www.meetup.com/new-york-open-source-data-infrastructure-meetup/events/297484047/

18:00 - 18:30 Welcome: Networking & snacks 18:30 - 18:35 Kickoff: Welcome Aiven 18:35 - 19:00 A Guide to Product Experimentation (Erin Mikail Staples, LaunchDarkly) 19:00 - 19:30 Building Real-time Pipelines: A Case Study with Transit Data (Tim Spann, Cloudera) 19:30 ~ 21:00 Food & networking

Feb 2024: Webinar https://www.cloudera.com/about/events/webinars/stay-ahead-of-cyber-threats-by-utilizing-data-in-motion.html?utm_medium=virtual-event&utm_source=resources-module&keyplay=ALL&utm_campaign=FY25-Q1-CorporateWebinar-AMER-cyber-threats&cid=701Hr000001pXCQIA2

Feb 20, 2024: 12-1PM EST. Virtual. Azure Data Tech Groups: DBA Fundamentals Group https://www.meetup.com/dba-fundamentals-group/events/296855261/

Feb 28, 2024: NYC. Cloudera Meetup. Flink https://www.meetup.com/futureofdata-princeton/events/298661947/

Feb 29, 2024: Virtual. Conf42 Python. https://www.conf42.com/Python_2024_Tim_Spann_apache_nifi_2_processors

https://www.conf42.com/Python_2024_Karin_Wolok_nifi__kafka_risingwave_iceberg_llm

March 5, 2024: Princeton. Meetup. GenAI. https://www.meetup.com/applied-generative-artificial-intelligence-applications/

March 15, 2024: TCF Pro. Princeton, NJ. IT Professional Conference at Trenton Computer Festival IEEE Information Technology Professional Conference on Friday, March 15th, 2024 https://princetonacm.acm.org/tcfpro/

April 2024: XtremeJ 2024. Virtual. https://xtremej.dev/2023/schedule/

May 8-9, 2024: Data Summit 2024. Boston, MA. https://www.dbta.com/DataSummit/2024/default.aspx

Cloudera Events https://www.cloudera.com/about/events.html

More Events: https://www.linkedin.com/pulse/schedule-2024-tim-spann--y4coe

Code

  • https://github.com/tspannhw/FLaNK-python-watsonx-processor
  • https://github.com/tspannhw/FLaNK-DatabaseTableSchemaRegistry
  • https://github.com/tspannhw/FLaNK-CDW
  • https://github.com/tspannhw/FLaNK-VectorDB
  • https://github.com/tspannhw/FLaNK-RPI5
  • https://github.com/tspannhw/FLaNK-EdgeAI
  • https://github.com/kevinbtalbert/NiFi-Flows-Demos
  • https://github.com/DataSQRL/apirag
  • https://github.com/tspannhw/FLaNK-python-ExtractCompanyName-processor
  • https://github.com/ThomasVitale/llm-apps-java-langchain4j

Models

  • https://github.com/zhuyiche/llava-phi
  • https://github.com/SkunkworksAI/BakLLaVA
  • https://github.com/stanford-futuredata/ColBERT
  • https://github.com/state-spaces/mamba

Data

  • https://github.com/jonkeegan/faa-navigation-waypoints/tree/main?ref=beautifulpublicdata.com

Tools

  • https://github.com/wxywb/history_rag
  • https://github.com/video-db/StreamRAG
  • https://github.com/Fanghua-Yu/SUPIR
  • https://github.com/robocorp/robocorp
  • https://github.com/danielmiessler/fabric
  • https://posit-dev.github.io/great-tables/articles/intro.html
  • https://github.com/huggingface/datatrove
  • https://github.com/huggingface/setfit
  • https://github.com/huggingface/text-generation-inference
  • https://github.com/huggingface/distil-whisper
  • https://github.com/huggingface/discord-bots
  • https://github.com/explodinggradients/ragas
  • https://github.com/willie-engelbrecht/ParseMultiLevelJSON-NiFiRecordProcessors
  • https://github.com/ThomasVitale/llm-apps-java-spring-ai
  • https://github.com/beehive-lab/TornadoVM
  • https://www.autobackend.dev/
  • https://trpc.io/docs/quickstart
  • https://github.com/sqlchat/sqlchat
  • https://github.com/AI4Finance-Foundation/FinGPT/tree/master/fingpt/FinGPT_Forecaster
  • https://github.com/openvinotoolkit/awesome-openvino
  • https://github.com/intel/openvino-ai-plugins-gimp
  • https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/overview.html
  • https://github.com/bes-dev/stable_diffusion.openvino
  • https://github.com/samontab/llm_sentiment
  • https://github.com/BMW-InnovationLab/BMW-IntelOpenVINO-Detection-Inference-API
  • https://github.com/RapidAI/RapidOCR
  • https://github.com/Hmm466/OpenVINO-Java-API
  • https://github.com/openvinotoolkit/openvino_notebooks
  • https://github.com/openvinotoolkit/openvino
  • https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/267-distil-whisper-asr
  • https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/264-qrcode-monster
  • https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/262-softvc-voice-conversion
  • https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/257-llava-multimodal-chatbot
  • https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/270-sound-generation-audioldm2
  • https://www.dfrobot.com/product-2778.html?tracking=65b9f32d03987
  • https://intelpython.github.io/DPEP/
  • https://github.com/kentontroy/cloudera_cml_llm_rag
  • https://community.cloudera.com/t5/Support-Questions/SSE-Client-in-Apache-NiFi/m-p/359742
  • https://github.com/PKU-YuanGroup/MoE-LLaVA
  • https://github.com/collabora/WhisperFusion
  • https://github.com/deepseek-ai/DeepSeek-Coder
  • https://llava-vl.github.io/blog/2024-01-30-llava-1-6/
  • https://github.com/mkjt2/lockbox
  • https://www.pipeless.ai/
  • https://github.com/pipeless-ai/pipeless
  • https://github.com/dennislee22/deepspeed-train-CML
  • https://github.com/microsoft/TransformerCompression
  • https://github.com/microsoft/PubSec-Info-Assistant
  • https://github.com/microsoft/Qcodes
  • https://onnxruntime.ai/
  • https://onnxruntime.ai/docs/tutorials/iot-edge/rasp-pi-cv.html#prerequisites
  • https://github.com/microsoft/hummingbird
  • https://microsoft.github.io/promptflow/how-to-guides/faq.html#openai-1-x-support
  • https://github.com/microsoft/XmlNotepad
  • https://learn.microsoft.com/en-us/openapi/kiota/overview
  • https://github.com/microsoft/kiota-java
  • https://metaflow.org/
  • https://netflix.github.io/atlas-docs/
  • https://github.com/RedWedgeX/obs-sessionize-title-updater
  • https://github.com/openvinotoolkit/openvino_notebooks
  • https://benchmark.clickhouse.com/
  • https://blog.allenai.org/olmo-open-language-model-87ccfc95f580
  • https://gpt4all.io/index.html
  • https://datastrato.ai/docs/0.3.1/
  • https://github.com/plasma-umass/scalene
  • https://github.com/recap-build/hive-metastore-standalone
  • https://github.com/naushadh/hive-metastore
  • https://wimbd.apps.allenai.org/
  • https://github.com/allenai/dolma
  • https://github.com/umd-huang-lab/Mementos
  • https://mitenmit.github.io/gpt/
  • https://github.com/msasikanth/twine
  • https://github.com/NVIDIA/NeMo-Guardrails
  • https://github.com/xmlking/macbooksetup
  • https://github.com/xmlking/ai-experiments
  • https://github.com/adamcohenhillel/ADeus
  • https://redpanda.com/blog/using-apache-nifi-with-redpanda-kafka
  • https://github.com/seaweedfs/seaweedfs
  • https://github.com/AILab-CVC/YOLO-World
  • https://github.com/OpenBMB/MiniCPM
  • https://github.com/Avaiga/taipy
  • https://spectrum.ieee.org/non-line-of-sight-infrared
  • https://github.com/highlight/highlight
  • https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/256-bark-text-to-audio
  • https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/257-llava-multimodal-chatbot
  • https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/264-qrcode-monster
  • https://github.com/openvinotoolkit/awesome-openvino
  • https://rye-up.com/
  • https://github.com/karpathy/ng-video-lecture
  • https://lmstudio.ai/
  • https://www.graalvm.org/

© 2020-2024 Tim Spann

More from this blog

Unstructured Data Unleashed

198 posts

https://github.com/tspannhw/SpeakerProfile

Tim Spann is a Principal Developer Advocate for Zilliz and Milvus. He works with Milvus, Towhee, Attu, GPTCache, Generative AI, HuggingFace, Python, Java, A